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The  Kalman  filter  has  been  one  of  the  most  widely 
applied  techniques  in  the  area  of  modern  control,  signal 
processing  and  communication  applications.  The 
implementation  of  a  Kalman  filter  using  the  logarithmic 
number  system  ( LNS )  is  considered.  The  choice  of  LNS  is 
highly  attractive  in  this  digital  filtering  setting  since  it 
offers  high  precision,  wide  dynamic  range,  attractive 
architecture,  and  ease  of  arithmetic  operations.  A 
theoretical  error  analysis  is  presented  concerning  the  mean 
and   variance   of   the   actual    estimations   due   to   a 
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finite-precision  logarithmic  implementation.  Simulation 
studies  are  utilized  to  show  how  closely  the  analytical 
predictions  agree  with  actual  results.  A  comparison  between 
the  finite-precision  conventional  Kalman  filter  and  the 
proposed  scheme  is  performed  in  terms  of  speed  and  accuracy. 

In  addition,  an  adaptive  Kalman  filter  is  considered 
for  which  the  input  and  output  noise  covariances  are 
unknown.  A  new  procedure  is  presented  for  the 
identification  of  the  unknown  covariances  using  the 
autocorrelation  information  contained  in  the  innovation 
error  sequence  and  a  modified  Fibonacci  search.  Real-time 
performance  is  facilitated  through  the  use  of  the  presented 
efficient  high  speed  digital  autocorrelation  algorithms  and 
search  procedures. 

Finally,  the  implementation  of  the  Kalman  filter  using 
systolic  array  architectures  is  investigated.  Systolic 
architectures  have  received  much  attention  recently  as  a 
means  of  achieving  high  data  rates.  Systolic  designs  have 
been  used  to  perform  matrix  operations  making  them  suitable 
for  Kalman  filtering  processing.  Using  an  orthogonal  array 
processor,  a  new  algorithm  is  presented  which  increases  the 
throughput  of  matrix  operations.  This  new  algorithm  will  be 
compared  with  other  systolic  architectures  on  the 
implementation  of  Kalman  filters. 
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CHAPTER  ONE 
INTRODUCTION 


The  theory  of  optimal  filtering  has  application  to  a 
broad  range  of  meaningful  problems.  Within  the  family  of 
known  optimal  algorithms,  the  filtering  technique  known  as 
the  Kalman  filter  has  been  the  most  visible.  It  has  been 
shown  to  be  well  suited  for  digital  computer  implementation 
provided  high  throughput  is  not  a  prerequisite.  This  class 
of  filter  has  penetrated  the  areas  of  control, 
communication,  guidance-navigation,  and  data  processing. 

To  successfully  design  a  Kalman  filter,  and  its 
counterparts,  the  engineer/scientist  must  address  both 
modeling  and  computation  problems.  The  Kalman  filter  is 
intrinsically  a  model  reference  filter,  and  as  a  result,  can 
suffer  from  degraded  performance  if  the  system  model  is 
improperly  configured.  In  order  to  desensitize  the  filter 
to  model  dependent  errors,  an  adaptive  filter  policy  should 
be  considered.  This  leads  to  the  second  problem.  It  has 
been  assumed  that  the  Kalman  filter  would  lend  itself  well 
to  digital  computation  due  to  its  recursive  structure.   This 


has  not  always  been  the  case.  The  computational  burdens 
imposed  by  real-time,  high  speed,  high-precision  Kalman 
filters  can  exceed  a  computers  throughput  capabilities.  In 
addition,  there  is  always  a  controversy  on  whether  floating 
point  or  fixed  point  arithmetic  should  be  used.  Furthermore 
there  is  generally  the  overlooked  problem  of  burst  error 
management  of  filters  operating  in  hostile  environments 
which  must  be  attended  to.  The  additional  computational 
demands  of  an  adaptive  filter  often  exceed  the  performance 
capability  of  conventional  computer  architectures.  This  is 
especially  true  in  those  cases  where  small  wordlength, 
limited  capability  processors  are  found  (viz.  the 
microprocessor).  Rather  than  developing  theory  over  a  real 
number  field,  we  are  proposing  to  represent  numbers  over 
finite  fields  and  compressed  numbering  systems. 

Recent  developments  in  the  area  of  logarithmic  number 
systems  (LNS)  have  proven  it  to  offer  some  significant 
advantages  over  other  number  weighted  systems  like  the  fixed 
or  floating  point.  The  main  advantages  of  LNS  are  the 
extended  dynamic  range  on  which  they  operate  and  the  high 
speed  of  operations  accompanied  by  a  remarkably  regular  data 
flow.  Another  remarkable  feature  of  the  LNS  architecture  is 
its  ability  to  be  efficiently  transferred  into  VLSI. 
Furthermore  LNS  arithmetic  units  are  ideally  suited  to 
systolic  array  data  processing,  thus  increasing  throughput. 


The  organization  of  the  dissertation  is  as  follows: 
Chapter  Two  contains  a  comprehensive  survey  of  the 
literature  on  Kalman  filters  and  logarithmic  number  system. 
In  Chapter  Three,  the  derivation  of  the  discrete-time 
(sampled  data)  optimal  estimator,  known  as  the  Kalman 
filter,  will  be  presented.  Chapter  Four  contains  a 
description  of  the  logarithmic  number  system  and  its 
properties.  In  addition  a  logarithmic  number  processor, 
(FU)  ,  will  be  developed.  Chapter  Five  describes  the 
quantization-error  models  used  in  LNS  arithmetic,  namely, 
input,  coefficient  and  arithmetic  operations.  In  Chapter 
Six,  the  implementation  of  a  Kalman  filter  using  the 
logarithmic  number  system  (LNS)  will  be  considered.  A 
theoretical  error  analysis  will  be  presented  concerning  the 
mean  and  variance  of  the  actual  estimation  due  to  a 
finite-precision  logarithmic  implementation.  Simulation 
studies  will  be  utilized  to  show  how  closely  the  analytical 
predictions  agree  with  actual  results.  In  addition 
comparison  between  the  finite-precision  conventional  Kalman 
filter  and  the  proposed  scheme  will  be  performed  in  terms  of 
speed  and  accuracy.  In  Chapter  Seven,  an  adaptive  Kalman 
filter  will  be  considered  for  which  the  input  and  output 
noise  covariances  are  unknown.  A  new  procedure  will  be 
presented  for  the  identification  of  the  unknown  covariances 
using  the  autocorrelation  information  contained  in  the 
innovation  error   sequence  and  a  modified  Fibonacci  search. 


In  Chapter  Eight,  implementation  of  a  Kalman  filter  using  a 
new  systolic  design  will  be  investigated  and  compared  with 
other  systolic  architectures.  Finally  in  Chapter  Nine,  by 
way  of  summarizing  the  proceeding,  the  significance  of  the 
described  research  will  be  brought  out  and  some  directions 
for  future  research  will  be  indicated. 


CHAPTER  TWO 
SURVEY  OF  LITERATURE 


Kalman  Filters 

In  1960  Kalman  [1,2]  developed  the  linear  estimation 
theory  hereafter  referred  to  as  the  Kalman  filter.  Others 
[3,4]  developed  the  theory  even  further  and  gave  more 
insights  to  the  linear  estimation  problem.  This  estimation 
theory  can  be  formulated  in  terms  of  either  a  time-discrete 
or  time  continuous  model.  Additional  subjects  on  Kalman 
filters  that  have  been  treated  in  the  voluminous  literature 
will  be  presented. 

Correlated  Process  and  Measurement  Noise 

The  white  noise  sequences  have  been  assumed  to  be 
uncorrelated  in  the  formulation  of  the  Kalman  filter.  This 
restriction  is  not  necessary.  However,  the  resulting 
equations  are  considerably  more  complicated  [5-7], 


Wiener  Filter 

Underlying  Wiener  filter  design  [8]  is  the  so  called 
Wiener-Horf  equation,  its  solution  through  spectral 
factorization  and  the  practical  problems  of  synthesizing  the 
theoretically  optimal  filter  from  its  impulse  response.  In 
the  case  of  time-invariant  systems  models  and  stationary 
noises,  the  Wiener  filter  is  shown  to  be  equivalent  to  the 
steady  state  Kalman  filter  appropriate  to  the  given  problem 
[8-10]. 

Random  Sampling  Times 

The  measurement  data  have  been  assumed  to  occur  at 
prespecified  times.  It  can  happen  that  the  time  to  which  a 
measurement  is  related  is  random.  Also,  the  transition 
matrix  and/or  the  observation  matrix  might  contain 
parameters  that  are  random.  These  cases  have  been 
considered  by  several  investigators  including  Tou  [11], 
Rauch  [12],  Gunckel  [13]  and  others. 

Effect  of  Inaccurate  Mathematical  Models  and  Statistics 

The  validity  of  the  estimates  provided  by  this 
estimation  technique  require  an  accurate  mathematical 
description  of  the  dynamical  and  observational  processes. 
The  omission  of  terms  that  actually  enter  these  processes 
can  result  in  extremely  poor  estimates.  If  the  statistics 
(i.e.,    R,    Q)    are   not   accurate   descriptions   of   the 


second-order  moments  of  the  noise  processes  [14],  the 
covariance  matrix  P  might  not  be  a  realistic  measure  of  the 
accuracy  of  the  estimate. 


Performance  (Sensitivity)  Analysis 

The  estimates  of  the  state  are  random  variables.  To 
determine  the  validity  of  the  estimates  and  the  significance 
of  P,  it  is  necessary  to  perform  a  Monte  Carlo  simulation  of 
the  physical  system  under  consideration  115]. 
Unfortunately,  this  is  a  costly  and  time  consuming  process. 
More  efficient  means  of  generating  the  statistical 
information  is  the  mean  analysis  [16]  and  the  covariance 
analysis  [17].  One  particular  use  of  a  covariance 
performance  (sensitivity)  analysis  is  a  systematic  approach 
to  the  filter  tuning  [18-24]  process.  The  basic  objective 
of  filter  tuning  is  to  achieve  the  best  possible  estimation 
performance  from  a  filter  of  specified  structural  form, 
i.e.,  totally  specified  except  for  PQ  and  the  time  histories 
of  Q  and  R.  These  covariances  not  only  account  for  actual 
noises  and  disturbances  in  the  physical  system,  but  also  are 
a  means  of  declaring  how  adequately  the  assumed  model 
represents  the  "real  world"  system.  Once  a  particular 
filter  has  been  tuned,  an  error  budget  [19-24]  can  be 
established.  Essentially,  this  consists  of  repeated 
covariance  analyses  in  which  the  error  sources  in  the  truth 
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model  are  "tuned  on"  individually  to  determine  the   separate 
effects  of  these  sources. 

Square  Root  Filtering 

Measurement  updating  of  the  covariance  matrix  requires 
a  rather  long  wordlength  to  maintain  acceptable  numerical 
accuracy.  The  difficulties  encountered  in  converting  a 
Kalman  filter  tuned  over  a  long  wordlength,  to  an  effective 
algorithm  on  a  small  wordlength  on  line  computer  are  well 
documented  [25].  To  circumvent  these  problems  in  numerics 
inherent  to  the  Kalman  filter  algorithm,  alternate  recursion 
relationships  [25-27]  have  been  developed  to  propagate  and 
update  the  covariance  matrix  in  a  square  root  sense.  The 
square  root  approach  can  yield  twice  the  effective  precision 
of  the  conventional  filter  in  ill-conditioned  problems. 
Another  alternative  is  the  U-D  covariance  factorization 
filter  which  although  is  not  actually  a  square  root  filter 
is  closely  related  to  it. 

Suboptimal  Filtering 

The  dimensions  of  the  matrices  that  are  involved 
sometimes  become  so  large  as  to  render  their  manipulation 
virtually  impossible  in  a  particular  digital  computer. 
Also,  when  the  state  vector  is  large,  it  is  often 
impracticable  to  compute  estimates  for  every  component  and 
it   becomes   necessary   to   consider   only   a   subset  of  the 


components.  To  reduce  the  dimensionality  of  the  over-all 
system  and  still  retain  a  resonable  model,  methods  which  are 
not  optimal  in  the  strict  sense  have  been  devised  [28]. 


Compensation  of  Linear  Model  Inadequacies 

There  is  always  some  discrepancy  between  the 
performance  indication  propagated  by  the  filter  and  the 
actual  performance  achieved  in  realistic  applications, 
because  the  model  embodied  in  the  filter  cannot  be  exactly 
correct.  Such  discrepancy  is  termed  divergence 
[18-24,29-32].  Compensation  techniques  have  been  used  for 
model  inadequacy.  The  first  of  these  methods  is  the 
addition  of  pseudonoise  to  the  assumed  model  and  artificial 
lower  bounding  of  error  covariance  matrix  elements 
[29,30,32],  By  adding  such  fictitious  noise  to  the  dynamic 
model  one  "tells"  the  filter  that  it  should  decrease  its 
confidence  in  its  own  model.  In  the  case  in  which  a  linear 
model  is  indeed  adequate,  but  only  for  a  limited  length  of 
time,  then  limiting  of  effective  filter  memory  and 
overweighting  most  recent  data  [30,33-35]  and  finite  memory 
filtering  [30,36]  methods  are  used.  Finally  extended  Kalman 
filtering  [30,37]  attempts  to  exploit  linear  models  and 
methods  in  the  case  in  which  a  substantially  better 
depiction  of  the  true  system  would  be  in  the  form  of  a 
nonlinear   model.    The   basic   idea   of  the  extended  Kalman 
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filter  is  to  relinearize  about  each  state  estimate   once   it 
has  been  computed. 

Linear  Smoothing  Problem 

There  are  three  classes  of  smoothing  problems,  namely 
the  fixed-interval,  fixed-point  and  fixed-lag  [8,38]. 
Fixed-interval  technique  is  used  for  post-experiment  data 
reduction  to  obtain  refined  state  estimates  of  better 
quality  than  that  provided  by  on-line  filters,  i.e., 
post-flight  analysis  of  a  missile.  Fixed-point  smoothing  is 
considered  when  there  is  a  certain  point  (or  points)  in  time 
at  which  the  value  of  the  system  state  is  considered 
critical.  For  example  conditions  at  engine  burnout  time  are 
critical  to  rocket  booster  problems.  Finally  the  fixed-lag 
technique  is  used  when  it  is  desirable  to  delay  the 
computation  of  a  state  estimate  in  order  to  take  advantage 
of  additional  information.  It  is  particularly  applicable  to 
communications  and  telemetry  data  reduction. 

Observers 

In  some  estimation  problems,  it  may  be  desired  to 
reconstruct  the  state  of  a  deterministic,  linear  dynamical 
system  —  based  on  exact  observations  of  the  system  output. 
For  deterministic  problems  of  this  nature,  stochastic 
estimation  concepts  are  not  directly  applicable.  Luenberger 
[39,40]    formulated   the   motion   of   an   observer   for 
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reconstructing  the  state  vector  of  an  observable 
deterministic  linear  system  from  exact  measurements  of  the 
output.  The  stochastic  observer  unifies  the  concepts  of 
deterministic  Luenberger  observer  theory  and  stochastic 
Kalman  theory,  for  continuous  linear  systems.  Analogous 
results  have  been  derived  for  discrete  systems  [41,42]. 

Adaptive  Estimation 

In  order  to  improve  the  quality  of  the  state  estimates, 
it  would  be  desirable  in  many  instances  to  estimate  a  number 
of  uncertain  parameters  in  the  dynamics  of  a  measurement 
model  simultaneously  in  an  online  fashion.  This  is  often 
termed  combined  state  estimation  and  system  identification 
[43-49].  In  addition,  one  would  like  to  readjust  the 
assumed  noise  strengths  in  the  filter's  internal  model, 
based  upon  information  obtained  in  real  time  from  the 
measurements  becoming  available,  so  that  the  filter  is 
continually  "tuned"  as  well  as  possible.  Such  an  algorithm 
is  often  termed  an  adaptive  or  self-tuning  estimation 
algorithm  [45-48,50,51].  Several  authors  use  the 
information  found  in  the  innovation  sequence  to  estimate  the 
unknown  noise  covariances  [45-48,50]. 

Applications 

The  earliest  applications  of  the   Kalman   filter   dealt 
with  satellite  orbit  determination,  tracking  and  navigation 
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problems.   Tapley  and  Ingram  [52]  used   an   extended   Kalman 

filter  to  estimate  the  state  and  the  unmodeled  accelerations 

acting  on  an  orbiting  space  vehicles.   Cambell  et  al.    [53] 

described   the  use  of  the  filter  for  orbit  determination  for 

the  Voyager  spacecraft  during  its  Jupiter  fly  by.   Dawn   and 

Fitzgerald  [54]  presented  the  application  of  a  Kalman  filter 

in  numerous  phased  array  radars  to  track  satellites,  reentry 

vehicles   and  missiles.   Application  of  Kalman  filtering  to 

spacecraft  range  residual  prediction  was  studied   by  Madrid 

and  Bierman  [55].   They  developed  a  U-D  covariance  factored 

Kalman  filter  to  be  able  to  evaluate  the  validity  of   range 

measurements  to  a  distant  spacecraft  in  near-real  time  so  as 

to  be  able  to  detect  receiving-station  hardware  problems   as 

they  occur.    The   Kalman  filter  is  admirably  suited  to  the 

problem  of  monitoring  measurement   consistency  and   the 

related   statistics.    Additional   study  was  done  on  several 

versions  of  the  extended  Kalman   filter  which   is  used   to 

estimate   the   position,   velocity  and  other  key  parameters 

associated  with  maneuvering  reentry  vehicles   [56].    The 

Kalman  filter   is  also  well  suited  for  application  to  the 

problem  of  anti-aircraft  gun  fire  control.   Berg   [57]   made 

use  of  the  Kalman  filter   theory  to  develop  an  accurate, 

numerically  efficient  scheme  for  estimating   and   predicting 

the   present   and   future  position  of  maneuvering  fixed-wing 

aircraft.   Application  of  modern  estimation   techniques   to 

multisensor  navigation  systems  began  in  the  mid-1960' s. 


13 

Because  the  errors  in  a  typical  navigation  system  propagate 

in   essentially  a   linear  manner  and  linear  combinations  of 

these  errors  can  be  detected  from  external  measurements,  the 

Kalman    filter   is   ideally   suited   for   their   estimation 

[58-61].   Application  of  Kalman   filtering   on   ship  motion 

prediction   and  position  control  [62-64]  have  many  features 

in  common  with  the  tracking  and  navigation  problems.    The 

modeling   of  ship/wave  interactions  requires  the  development 

of  simple,  but  effective,  models  using  signal  processing   or 

identification  techniques.    The  authors  are  concerned  with 

the  Kalman  filter  performance  in  the  context   of   a   control 

system.    For  remote  sensing  applications  [65,66]  the  Kalman 

filter  appears  as  a  part  of  a  signal  processor  that  is   used 

for   offline  analysis.   These  applications  are  not  concerned 

with  real-time  processing.   Nonetheless,  simple   models   are 

used   to   accomplish   the  processing,  and  the  performance  is 

based  on  the  achievement  of  the  general  goals  of  the  system. 

Estimation   and   control   problems   abound   in   industrial 

processes  and  power   systems   [67-71].   These  applications 

impose  the  necessity  for  real-time  operation,  generally  with 

limited   computational   capability,   using  poorly   defined 

models  of  the  process.    Simplicity  and  adaptability  are 

basic  concerns  in  these  applications.   Application  of  Kalman 

filtering  has   been   extended   to   the   geophysics  area.   A 

Kalman  filter  was  developed  as  a  white-noise   estimator   for 

seismic  data  processing  in  oil  exploration  [72].   The  Kalman 
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filtering  approach  was  used  to  obtain  optimal  smoothed 
estimates  of  the  so-called  reflection  coefficient  sequence. 
The  sequence  contains  important  information  about  subsurface 
geometry.  Ruckebusch  [73]  described  an  application  of 
Kalman  filtering  to  a  geophysical  subsurface  estimation 
problem.  Other  areas  where  the  theory  of  optimal  estimation 
has  applications  are  tracer  studies  in  nuclear  medicine, 
statistical  image  enhancement,  estimation  of  river  flows, 
network  and  load  forecasting  [74,75],  classification  of 
vectorcardiograms  and  demographic  models  [76]. 

Logarithmic  Number  System 

In  1971  Kingsbury  and  Rayner  [77]  first  outlined  logic 
hardware  and  software  function  approximations  to  the 
addition  of  two  positive  numbers  in  logarithmic  arithmetic. 
Swartzlander  and  Alexopoulos  [78]  concentrated  on  ROM  based 
hardware  for  the  fastest  addition  but  with  a  wordlength 
limit  of  12  bits.  Later  Lee  and  Edgar  [79]  developed 
efficient  algorithms  to  implement  8  and  16  bit  logarithmic 
arithmetic  on  microprocessors.  They  established  that  unlike 
floating-point  arithmetic,  which  provides  accuracy  at  the 
expense  of  speed,  the  LNS  provides  for  both  and  is  suitable 
for  advanced  DSP  applications.  Kurokawa  et  al.  [80] 
applied  LNS  to  the  implementation  of  digital  filters  and 
demonstrated  that  it  gives  filtering  performance  superior  to 
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that  of  a  floating-point  system  of  equivalent  wordlength  and 
range.  Similar  observation  was  made  by  Swartzlander  et  al. 
[81]  when  applying  LNS  to  Fast  Fourier  Transform  processor. 
The  general  conclusion  of  these  studies  is  that  addition, 
subtraction,  multiplication  and  division  are  very  fast  and 
easy  to  implement  in  LNS.  The  advances  in  semiconductor 
memory  technology  have  renewed  interest  in  the  LNS  over  the 
last  few  years.  A  brief  survey  of  the  literature  in  the  LNS 
will  be  presented. 

Logarithmic  Conversion 

Multiplication  and  division  operations  in  computers  are 
usually  accomplished  by  a  series  of  additions,  subtractions 
and  shifts.  Consequently,  the  time  required  to  execute  such 
commands  is  much  longer  than  the  time  required  to  execute  an 
add  or  subtract  command.  In  an  early  work  by  Michell  [82]  a 
method  of  computer  multiplication  and  division  is  proposed 
which  uses  binary  logarithms.  The  logarithm  of  a  binary 
number  is  determined  approximately  from  the  number  itself  by 
simple  shifting  and  counting  with  no  table  lookups  required. 
Simple  add  (or  subtract)  and  shift  operations  are  all  that 
is  required  to  multiply  or  divide.  However,  since  the 
logarithms  used  are  approximations  of  the  actual  logarithms, 
the  operations  yield  errors.  The  paper  discusses  a  method 
of  finding  approximate  base-two  logarithms  and  an  analysis 
of  maximum  errors   that  may  occur   as   a   result   of 
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approximations.  A  more  vigorous  analysis  of  errors  in  the 
multiplication  and  division  operations  using  binary 
logarithms  is  presented  by  Hall  et  al.  [83].  In  this  paper 
the  authors  discuss  algorithms  for  computing  approximate 
binary  logarithms,  antilogarithms  and  applications  to 
digital  filtering  computations  and  establish  that 
logarithmic  techniques  are  well  suited  for  parallel  digital 
filter  banks  and  multiplicative  digital  filters.  Marino 
[84]  presents  a  parabolic  approximation  method  for 
generation  of  binary  logarithms  with  an  increased  precision 
over  Hall  et  al.  [83],  where  a  review  base-two  logarithm 
and  antilogarithm  computation  using  a  fixed  number  of 
iterations  is  presented.  Their  method  is  claimed  to  yield 
an  improved  accuracy  and  can  be  implemented  as  a  subroutine 
or  using  a  hardware  peripheral  device. 

Addition  and  Subtraction  Algorithms 

Kingsbury  and  Rayner  [77]  were  the  first  to  propose  two 
methods  of  adding  or  subtracting  logarithmically  encoded 
numbers.  In  the  direct  method  of  addition  and  subtraction, 
they  use  approximate  evaluation  of  logarithms.  However, 
their  second  method  is  based  on  table  lookups  using  ROM 
which  leads  to  potentially  faster  operations.  The  paper 
also  indicates  a  possible  method  of  table  reduction.  In 
1975,  Swartzlander  and  Alexopoulos  [78]  unaware  of  the 
previous  work   by  Kingsbury   and   Rayner,    proposed   a 
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sign/logarithm  system  along  with  arithmetic  algorithms 
identical  to  those  in  [77].  Their  paper  suggests  hardware 
architectures  for  arithmetic  units  and  includes  comparison 
of  speeds  with  conventional  arithmetic  units.  Ironically, 
in  a  paper  by  Lee  and  Edger  [79],  the  LNS  was  reinvented  a 
third  time  but  enriched  with  ideas  for  implementation  in  8 
and  16  bit  microcomputers.  A  more  rigorous  treatment  of  LNS 
addition  and  subtraction  algorithms  with  respect  to  storage 
efficiency,  accuracy,  range,  noise  and  array  area  is  found 
in  Lee  and  Edger  [85]. 

Multiplication  and  Division  Algorithms 

In  LNS,  multiplication  and  division  are  achieved  by 
simply  adding  or  subtracting  logarithms  and  are  thus 
straightforward  operations.  These  algorithms  along  with 
examples  and  hardware  implementations  may  be  found  in 
references  77-79  and  85. 

Table  Reduction  Techniques 

Current  algorithms  for  addition  and  subtraction 
operations  in  LNS  require  a  table  lookup  operation.  The 
size  of  the  table  is  governed  by  the  wordlength  size.  A 
16-bit  arithmetic  would  require  64K  of  storage.  The  current 
state  of  the  art  in  high  speed  ROM  and  RAM  is  such  that  LNS 
is  useful  for  wordlengths  up  to  12-13  bits  for  high  speed 
operations  and   16   bits   for   lower   speeds.    Due   to   the 
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discrete  coding  and  nature  of  the  functions  filling  lookup 
tables,  a  large  portion  of  the  values  is  zero.  The  number 
of  zeros  depends  on  the  radix  used  in  the  number  system  and 
the  number  of  fractional  bits  allotted  in  the  word  format. 
Kingsbury  and  Rayner  [77]  suggested  two  ways  of  reducing  the 
storage  capacity  by  employing  read  and  compare  successive 
approximation  process  and  linear  interpolation  technique. 
Lee  and  Edgar  [85]  determined  a  cutoff  value  for  the  address 
below  which  the  table  output  is  zero. 

Extended  Precision  LNS 

The  aforesaid  address  space  limitations  of  contemporary 
high  speed  memories  limit  the  admissible  logarithm  wordlengh 
to  12-13  bits.  To  overcome  this  problem,  Taylor  [86,87] 
suggested  a  modified  interpolation  technique  to  be  useful  in 
increasing  the  wordlength  of  practical  logarithmic 
arithmetic  unit  without  adding  any  significant  hardware 
burden. 

Digital  Filtering  Using  LNS 

An  early  work  by  Kingsbury  and  Rayner  [77]  implemented 
a  second  order  recursive  lowpass  filter  with  a  16  bit 
logarithmic  arithmetic  and  demonstrates  the  improvement  in 
dynamic  range  and  performance  compared  to  16  bit  fixed  point 
arithmetic.  Kurokawa  et  al.  [80]  recently  applied 
logarithmic  arithmetic  to   the   implementation  of  digital 
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filters  and  present  an  analysis  of  roundoff  error 
accumulation  in  the  direct  realization  of  logarithmic 
digital  filter  and  establishes  that  LNS  gives  filtering 
performance  superior  to  that  of  a  floating-point  system  of 
equivalent  wordlength  and  range.  Statistical  roundoff  error 
behavior  of  a  single  logarithmic  arithmetic  operation  may  be 
found  [80],  In  another  paper  by  Sicuranza  [88],  preliminary 
results  on  two  dimensional  filters  implemented  with  LNS  are 
presented. 

FFT  Implementation  with  LNS 

Swartzlander  et  al.  [81]  presented  a  FFT 
implementation  scheme  with  LNS  and  established  that  this 
approach  resulted  in  an  improved  error  performance  for  a 
given  wordlength  over  FFT's  implemented  with  conventional 
fixed  or  floating  point  arithmetic. 

Other  Applications  with  LNS 

Swartzlander  and  Gilbert  [89]  concluded  that 
convolution  units  using  LNS  arithmetic  have  a  figure  of 
merit  twice  that  of  two's  complement  merged  arithmetic  and 
tens  of  orders  higher  than  high  speed  modular  array. 
Besides,  they  claim  that  the  advances  in  VLSI  technology 
offer  a  strong  potential  for  LNS  applications.  Taylor 
[86,87]  presented  an  approach  to  a  fast  floating  point 
arithmetic  unit  employing  LNS.    His  approach  does  not 


require  exponent  alignment,  supports  high  speed  addition  and 
multiplication  and  admits  a  simple  VLSI  realization. 


20 


CHAPTER  THREE 
KALMAN  FILTERS 


Statement  of  the  Problem 

Application  of  optimal  estimation  is  predicated  on  the 
description  of  a  physical  system  under  consideration  by 
means  of  mathematical  models.  Early  work  in  control  and 
estimation  theory  involved  system  description  and  analysis 
in  the  frequency  domain.  Recent  works  have  involved  system 
descriptions  in  the  time  domain.  The  formulation  used 
employs  state-space  notation  which  offers  the  advantage  of 
mathematical  and  notational  convenience.  Moreover,  this 
approach  to  system  description  is  closer  to  physical  reality 
than  any  of  the  frequency-oriented  transform  techniques.  It 
is  particularly  useful  in  providing  statistical  description 
of  system  behavior.  Initially,  the  work  done  in  the  time 
domain  was  concerned  with  continuous  systems,  but  recently 
work  was  extended  to  the  discrete  case  in  which  information 
is  available  or  designed  only  at  specified   time   intervals. 
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With  the  use  of  digital  computers  or  microprocessors, 
discrete  systems  can  be  simulated  easily  and  computation 
burden  is  avoided  considerably. 

A  discrete  stochastic  dynamic  system  can  be  represented 


as 


x 


i 


Fx.  -  +  W,  ,  (3.1) 


Z.=Hx.+v.  (3.2) 

l      ii 


Where 
i  =0,1,2 


x.  =  nxl  state  vector 

i 

F  =  nxn  state  transition  matrix  (constant) 
w.  =  nxl  vector  of  Gaussian  input  white  noise 
z.  -  mxl  output  vector 
H  =  mxn  output  matrix  (constant) 

v.  -  mxl  vector  of  Gaussian  measurement  errors  (white) 

i 

The  above  equations  are  illustrated  in  Figure  (3-1). 


Least  Square  Estimation 

The  linear  least-squares  problem  involves  using  a  set 
of  measurements  z,  which  are  linearly  related  to  the  unknown 
quantities  x,  by  the  expression 

z  =  Hx  +  v 
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Figure  3-1.   A  Discrete  Stochastic  Dynamic  System. 
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where  v  is  a  vector  of  measurement  "noise."  The  goal  is  to 
find  an  estimate  of  the  unknown,  denoted  by  x.  In 
particular,  given  the  vector  difference 

z-Hx  (3.3) 

we  wish  to  find  the  x  that  minimizes  the  sum  of  the  squares 
of  the  elements  of  z-Hx.  The  vector  inner  product  generates 
the  sum  of  squares  of  a  vector.  Thus,  we  wish  to  minimize 
the  scalar  cost  function  J,  where 

J  =  (z-Hx)T  (z-Hx)  (3.4) 

Minimization  of  a  scalar,  with  respect  to  a  vector,  is 
obtained  when 

H   =  0  (3.5) 

ax 

and  the  Hessian  of  J  is  positive  semidef inite. 
Differentiating  J  and  setting  the  result  to  zero  yields 


HTHx  =  HTz  (3.6) 


It  is  readily  shown  that  the   second   derivative   of   J 

with   respect   to   x   is  positive  semidef inite;  and  Equation 

T 
(3.5)  does  indeed  define  a  minimum  when   H  H   possesses   an 

inverse   (i.e.    when   it   has   a  non-zero  determinant);  the 


25 


least-squares  estimate  is 

T   —1  T 

x  -  (H1H)  1H1z  (3.7) 


Estimation  of  Parameters  Using  Weighted  Least-Squares 

A  more  optimal  way  to  estimate  the  state  vector  x  is 
the  weighted  least-squares  estimate .  For  this  estimate  we 
choose  x,  as  the  value  of  x  that  minimizes  the  quadratic  for 

J= . 5 [ ( x-x ' ) TM_1 ( x-x ' ) + ( z-Hx ) TR~1 ( z-Hx ) ]  (3.8) 

M  and  R  are  the  inverse  matrices  of  the  expected  values 
of  (x-x' ) (x-x' )T  and  ( z-Hx) ( z-Hx ) T  respectively.  Note  that 
x'  is  an  estimate  of  the  state  before  the  measurements  were 
made.  With  this  choice  of  weighting  matrices,  the  weighted 
least-squares  estimate  is  identical  to  the 
conditional-expected  value  estimate  assuming  Gaussian 
distributions  of  x  and  v. 

To  determine  x,  consider  the  differential  of  Equation 
(3.8) 

dJ=dxT[M_1(x-x' )-HTR-1(z-Hx) ]  (3.9) 

In  order  that  dJ-0  for  arbitrary  dxT,  the  coefficient  of  dxT 
in  Equation  (3.9)  must  vanish 
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-1   T  -1   *   -1     T  -1 

(M   +H  R   H)x-M   X'+H  R   Z 


(M  1+HTR  1H)x'+HTR  -"-(z-Hx') 


or 


x  -  x'+PHTR  1(z-Hx' ) 


where 


P"1  =  M_1+HTR  XH  (3.10) 


The  quantity  P  in  Equation  (3.10)  is  the   covariance   matrix 
of  the  error  in  the  estimate  x;  that  is,  we  have 

P=E[ (x-x)(x-x)T]  (3.11) 


Since   M   is   the   error   covariance   matrix   before 
measurement,  it  is  apparent  from  Equation  (3.10)  that  P,  the 

error  covariance  matrix  after  measurement,  is   never   larger 

T  -1 
than  M,   since   H  R  H   is  at  least  a  positive  semidefinite 

matrix.   Thus,   the   act   of   measurement,   in   the   average 

decreases    (more   precisely,    it   never   increases)   the 

uncertainty  in  our  knowledge  of  the  state  x. 

Another  noteworthy  property  of  the  estimate  is  the  fact 

that 

E[(x-x)xT]  =  0 

that  is,  the  estimate  and  the   error   of   the   estimate   are 

uncorrelated.   In  this  case  where  x  and  v  are  Gaussian,  this 

implies  that  x  and  (x-x)  are  independent;  in  other  words   no 

improvement  in  (x-x)  can  be  obtained  by  knowledge  of  x  or  z. 


27 
Recursive  Filters 

A  recursive  filter  is  one  in  which  there  is  no  need  to 
store  past  measurements  for  the  purpose  of  computing  present 
estimates. 

Consider  a  discrete  system,  whose  state  at  time  t.  is 
denoted  by  x(t,  )  or  simply  x,  ,  where  w,  is  a  zero  mean  white 
sequence  of  covariance  Q. 

xk  -  Fxk-1  +  wk-l  <3-12> 

Measurements  are  taken  as  linear  combinations  of  the  system 
state  variables,  corrupted  by  uncorrelated  noise.  The 
measurement  equation  is  written  in  vector-matrix  notation  as 

zk  =  Hxk  +  vk  (3.13) 

where,  z^  is  the  set  of  m  measurements  at  time   t,  ,   namely, 

zlk'z2k' zmk'  arran9ed  in  a  vector  form.   In  addition,  H 

is  the  measurement  matrix  at  time  t,  ;  it  describes  the 
linear  combinations  of  state  variables  which  comprise  z.  in 
the  absence  of  noise.  The  dimension  of  the  measurement 
matrix  is  mxn,  corresponding  to  m-dimensioned  measurements 
of  a  n-dimensioned  state.  The  term  v.  is  a  vector  of  random 
noise  quantities  (zero  mean,  covariance  R)  corrupting  the 
measurements . 

Given  a  prior  estimate  of  the  system  state  at  time   t,  , 
denoted  xk(-)  we  seek  an  update  estimate,  x,(+)  based  on  use 


28 

of  the  measurement,  z,  .  In  order  to  avoid  a  growing  memory 
filter,  the  estimate  is  sought  in  the  linear  recursive  form 

xk(+)=K£xk(-)+Kkzk  (3.14) 

where  K/  and  K,  are  time-varying  weighting  matrices,  as  yet 
unspecified.  Although  the  following  deviation  is  for  an 
assumed  recursive,  single  stage  filter,  the  result  has  been 
shown  to  be  the  solution  for  a  more  general  problem.  If  w, , 
vk,  are  Gaussian,  the  filter  we  will  find  is  the  optimal 
multi-stage  filter;  a  nonlinear  filter  cannot  do  better  [2]. 


Discrete  Kalman  Filters 

It  is  possible  to  derive  the  Kalman  filter  by 
optimizing  the  assumed  form  of  the  linear  estimator.  An 
equation  for  the  estimation  error  after  incorporation  of  the 
measurement  can  be  obtained  from  Equation  (3.14)  through 
substitution  of  the  measurement  Equation  (3.13)  and  the 
defining  relations 

xk(+)=xk+xR(+)  (3.15) 

xk(-)=xk+xR(-)  (3.16) 

The  result  is 
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xk(+)=(Kk+KkH_I)xk+Kkxk(_)+Kkvk  (3.17) 

By  definition  E[v,  ]=0.  Also  if  E[(x.(-)]-0,  this  estimator 
will  be  unbiased  for  any  given  state  vector  x,  only  if  the 
term  K/+K.H-I  is  zero.   Thus  we  require 

Kk  =  MkH 

and  the  estimator  takes  the  form 

xk(+)=tI-KkH]xk(-)+Kkzk 
or  alternatively 

A  A  /«, 

xk(+)=xk(-)+Kk[zk-Hxk(-)]  (3.18) 

The  corresponding  estimation  error  is  from  Equations  (3.13), 
(3.15) ,  (3.16) ,  and  (3.18) 

xk(+)=[I-KRH]xk(-)+Kkvk  (3.19) 

Using  Equation  (3.19)  the  expression  for  the  change  in  the 
error  covariance  matrix  when  a  measurement  is  employed  can 
be  derived.   From  the  definition 

Pk(+)=E[xk(+)xk(+)T] 

Equation  (3.19)  gives 

Pk(+)=E{(I-KkH)xk(-)txk(-)T(I-KkH)T+vkTKkT]+ 
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+Kkvk [ xk ( " } T ( I_KkH ] T+vkTRkT ] } 
By  definition 

PR(-)=E[xk(-)xk(-)T] 

V^Vk^ 
and,  as  a  result  of  measurement  errors  being  uncorrelated 

E[xk(-)vkT]=E[vkxk(-)T]=0 
Thus 

Pk(+)=[I-KkH]Pk(-)[I-KkH]T+KkRKkT  (3.20) 


The  criterion  for  choosing  Kk  is  to  minimize  a  weight 
scalar  sum  of  the  diagonal  elements  of  the  error  covariance 
matrix  Pk(+).   Thus,  for  the  cost  function  we  choose 

Jk=E[xk(+)  Sxk(+)]  (3.21) 

where  S  is  any  positive  semidefinite  matrix.  The  optimal 
estimate  is  independent  of  S;  hence,  we  may  as  well  choose 
S=I,  yielding 


Jk=Trace[Pk(+)] 
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This  is  equivalent  to  minimizing  the  length  of  the 
estimation  error  vector.  To  find  the  value  KR  which 
provides  a  minimum,  it  is  necessary  to  take  the  partial 
derivative  of  JR  with  respect  to  KR  and  equate  it  to  zero. 
Use  is  made  of  the  relation  for  the  partial  derivative  of 
the  trace  of  the  product  of  two  matrices  A  and  B  (with  B 
symmetric ) , 


■2-  [  Trace  (ABAT)  ]  =  2AB 
3A 

From  Equations  (3.20)  and  (3.21)  the  result  is 

-2[I-KkH]Pk(-)HT+2KkR=0 

Solving  for  K,  , 

Kk=Pk(-)HT[HPk(-)HT+R]_1  (3.22) 


which  is  referred  to  as  the  Kalman  gain  matrix.  Examination 
of  the  Hessian  of  Jk  reveals  that  this  value  of  Kk  does 
indeed  minimize  Jk . 

Substitution  of  Equation  (3.22)   into   Equation   (3.20) 
gives,  after  some  manipulation, 

Pk(+)=Pk(-)-Pk(-)HT[HPk(-)HT+R]_1HPk(-) 


=[I-KkH]Pk(-) 
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which  is  the  optimized  value  of  the  updated  estimation  error 
covariance  matrix. 

Thus  far  we  have  described  the  discontinuous  state 
estimate  and  error  covariance  matrix  behavior  across  a 
measurement.  The  extrapolation  of  these  quantities  between 
measurements  is 

xk(~)=Fxk-l(+) 
Pk(-)=FPk_1(+)FT+Q 

The  equations  of  the  discrete  Kalman  filter  are 
summarized  in  Table  (3-1).  Nevertheless  the  application  of 
these  equations  requires  knowledge  of  the  input  and  output 
noise  covariances  statistics,  Q  and  R  respectively.  Figure 
(3-2)  illustrates  these  equations  in  block  diagram  form. 
The  Kalman  filter  to  be  implemented  appears  outside  the 
dashed-line  box.  In  the  linear  discrete  Kalman  filter, 
calculations  of  the  covariance  level  ultimately  serve  to 
provide  K.,  which  is  then  used  in  the  calculation  of  mean 
values  (i.e.  the  estimate  x,  ).  There  is  no  feedback  from 
the  state  equations  to  the  covariance  equations. 
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Table  3-1.   Summary  of  Discrete  Kalman  Filter  Equations 


System  Model 
Measurement  Model 


xk=Fxk-i+wk-i'  wk~N<°'Q> 


zk=Hxk+vk' 


vk~N(0,R) 


Initial  Conditions 
Other  Assumptions 


E[xQ]=x0,  E[ (Xq-Xq) (x0-x0)  ]=Pq 
E[w.v.T]=0  for  all  j,k 


Extrapolation 


State  Estimate 
Error  Covariance 


xk(-)=Fxk-l(+) 
Pk(-)=FPk_1(+)FT+Q 


State  Estimate  Update    x,  (  +  )  =x,  (  -  )+K,  [  z, -Hx,  (  -  )  ] 
Error  Covariance  Update  P, ( + )=[ I-K, H]P, ( - ) 
Kalman  Gain  Matrix      K,=P,  (  -  )HT[  HP,  (  -  )HT+R]~ 


34 


r 


UJ 


~i 


_l        oo 


f     5 

L 


H 


< 


c 
6 

H 
(0 
IMS 

OJ 
*J 

OJ 

IJ 
u 
Ifl 


X) 

c 

(0 


QJ 
T3 
O 

a 

01 

*J 

W 
>1 

to 


I 

m 
V 

p 

Cn 

■H 


J 


35 


Kalman  Filters  with  Deterministic  Inputs 

When  the  system  under  observation  is  excited  by  a 
deterministic  time-varying  input,  u,  whether  due  to  a 
control  being  intentionally  applied  or  a  deterministic 
disturbance  which  occurs,  these  known  inputs  must  be 
accounted  for  by  the  optimal  estimator.  The  dynamic  system 
containing  deterministic  inputs  can  be  described  as 

xi  "  Fxi-1  +  wi-l  +  ui-l 
Z.  =  Hx.  +  Vi 

In  order  for  the  estimator  to  be  unbiased  the  state  estimate 
update  is  given  as  follows: 

xk(+)=Fxk_1(+)+uk_1+Kk[zk-HFxk_1(+)-Huk_1]  (3.23) 

By  subtracting  x  from  x,  it  is  observed  that  the  resultant 
equations  for  x  are  precisely  these  obtained  before.  Hence, 
the  procedures  for  computing  P,K,  etc.,  remain  unchanged. 


CHAPTER  FOUR 
LOGARITHMIC  NUMBER  SYSTEM 


Various  number  systems  have  been  in  use  for  signal 
processing.  Traditionally,  when  a  large  dynamic  range  and 
high  precision  are  required,  data  are  given  a  floating  point 
representation.  The  problem  with  floating  point  arithmetic 
is  that  operations  are  slow  compared  to  their  fixed  point 
counterparts.  On  the  other  hand,  fixed  point  arithmetic 
offers  high  speed  in  the  expense  of  dynamic  range  and 
precision.  Recent  developments  in  the  area  of  logarithmic 
number  systems  ( LNS )  have  proven  it  to  offer  some 
significant  advantages  over  other  number  weighted  systems 
like  the  fixed  or  floating  point.  The  main  advantages  of 
LNS  are  the  extended  dynamic  range  on  which  they  operate  and 
the  high  speed  of  operations  accompanied  by  a  remarkable 
regular  data  flow. 
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LNS  Representation 

In  a  floating  point  system,  a   real   number   X   can  be 

approximated  as 

+e' 
X=+mxr   x;    l/r<|mx|<l  (4.1) 

where  r  is  the  radix,  mx  is  the  unsigned  M-bit  mantissa   and 

ex  is  the  unsigned  E-bit  exponent.   In  an  LNS  environment,  a 

real  X  is  represented  as 

+e 
X=±r   x;    ex=ex+Ax;    Ax=logr(mx) 

where  in  practice  X  is  coded  as  a  (N+2)-bit  word.  The  first 
two  bits  represent  the  sign  of  X.  The  N-bit  exponent  e  is 
represented  as  a  N-bit  word  consisting  of  F  fractional  bits 
and  N-F=I  integer  bits. 

By  distributing  the  appropriate  number  of  bits  for  the 
integer  and  fractional  parts,  both  large  dynamic  range  and 
high  precision  can  be  achieved.  More  specifically,  this 
system  is  characterized  by 

2"F(2N-1)   21 
I X I    =rz      u   1,=r2 
1  'max 


ix,  .  ..-a-'ta"-!).^1 

1  '  mm 


2-F+l   N  ,,   ,1+1 
RA=rz     {Z      1}*rZ 
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where  lxlmax  and  ' X ' min  are  the  lar9est  and  smallest 
positive  numbers  in  LNS,  respectively.  In  addition,  RA  is 
the  range  of  this  system  (the  ratio  of  the  largest  to  the 
smallest  number). 


LNS  Arithmetic  Operations 

Following  the  previously  adapted  number  representation, 
arithmetic  in  LNS  is  described  below.  For  the  fastest 
execution  of  operations  like  addition  and  subtraction,  table 
lookup  techniques  have  been  employed  to  support  them. 

Multiplication 

In  LNS,  the  logarithm  of  the  product  of  two  numbers   is 

given   by  the   sum  of   their   logarithms.    Specifically, 

multiplication  in  this  system  is  performed  in  the   following 

manner. 

e      e,      e 
For  A=r   ,  B=r   ,  C=r   ,  S=sign  bit 

C=AB;   e^ea+ea;   Sc=Sa©eb 

where  ©  denotes  an  "exclusive-or "  operation. 

Overflow  occurs  when  the  magnitude  of  the  result 
exceeds  the  largest  number  which  can  be  represented  for  the 
given  word  format.   Overflow  and  underflow  can   be   easily 
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detected,   by  a   simple   comparison  with   the   largest  and 
smallest  numbers  representable  in  the  system. 

Addition  and  Subtraction 

Addition  and  subtraction   in   this   number   system  are 

defined    in    terms    of   the   identity  A+B=A( 1+B/A) .    In 

particular,  addition  and  subtraction  in  LNS  are  performed  in 

the  following  manner, 
e      e,     e 
For  A=r   ,  B=r   ,Or   ,  S=sign  bit 

and  assuming  without  loss  of  generality  that  A>B 


C=A+B;   ec«-ea+*r(v);   v=eb~ea;   #v=logr  ( l  +  rv )  ;   Sc=Sa 


OA-B;   ec*-ea+0r(v);   v=eb-ea;   0v=logr(l-r  );   Sc=Sa 

The  realization  of  *  and  ©  is  achieved  through  the  use 
of  table  lookup  operations  from  high-speed,  high-density 
RAMs  or  ROMs.  Here,  A  N-bit  address  would  be  presented  to  a 
memory  unit  and  the  value  of  *  and  0  would  be  obtained  as 
direct  table  lookup.  Based  on  contemporary  high-speed 
memory  chip  limitations,  the  wordlength  of  v  can  be 
estimated  to  be  on  the  order  of  12  bits.  PLAs  have  been 
also  used  in  place  of  ROMs  in  order  to  reduce  table 
requi  rements. 
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A  Hybrid  Floating  Point  Logarithmic  Arithmetic  Unit 

When  a  large  dynamic  range  and  high  precision  are 
required,  data  must  often  be  given  a  floating  point 
representation.  Although  standardization  is  on  the  way, 
there  remains  today  many  floating  point  formats.  The 
problem  with  floating  point  arithmetic  is  that  operations 
are  slow  and  complex.  Furthermore,  floating  point  addition 
can  cause  a  special  problem.  The  time  it  takes  to  perform  a 
floating  point  addition  can  be  markedly  dependent  upon  the 
relative  values  of  the  data  to  be  added.  As  a  result, 
developing  efficient  real  time  code  in  floating  point  is 
difficult.  In  addition,  the  multiplier  and  addition  paths 
are  sufficiently  different  so  as  to  demand  two  separate  data 
paths  and  hardware  subsystems.  As  a  result,  the  utilization 
rate  of  the  hardware  floating  point  unit  can  be  as  low  as 
50%. 

Below  we  present  a  variation  of  the  floating  point 
arithmetic  unit.  It  incorporates  the  precision  and  dynamic 
range  of  the  floating  point  system  while  eliminating  the 
disadvantages  of  high  overhead  and  reduced  throughput  due  to 
the  exponent  alignment  requirement.  This  new  concept  shall 
be  referred  to  as  the  Florida  University  Floating  (Point) 
Unit  or  (FU)2  [87] . 
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Floating  Point  Format 

In  a  floating  point  environment,  a  real  number  X  can  be 
approximated  by  Equation  (4.1).  Arithmetic  in  the 
floating-point  system  is  performed  in  stages. 
Multiplication  is  a  multistep  process  given  by 

1.  Multiply  mantissas  and  add  exponents. 

2.  Post-normalize  mantissa  and  round  result. 

3.  Adjust  exponent  if  required. 

In  a  commercial  floating-point  adder/subtractor  unit, 
up  to  one-third  of  an  arithmetic  cycle  can  be  consumed  in  an 
exponent  alignment  process.  In  addition,  the  length  of  time 
required  to  complete  an  exponent  alignment  is  data  dependent 
and  therefore  variable.  For  example,  in  a  LSI-11,  exponent 
alignment  can  take  up  to  7  //s  with  a  basic  mantissa  add  time 
of  42  //s.  Formally,  in  a  floating-point  add,  the  following 
steps  are  taken: 

1.  Align  exponents  and  shift  mantissas  accordingly. 

2.  Add  mantissas. 

3.  Post-normalize  mantissa  and  round  result. 

4.  Adjust  exponent  if  required. 

(FU)2  unit 

The  floating  point  logarithmic  processor,  shown  in 
Figure  (4-1),  consists  of  a  floating  point  to  LNS  encoder,  a 
LNS  arithmetic  unit,  and  a  LNS  to   floating   point   decoder. 
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2 
Several   key  observation   can  be  made  about  the  (FU)   Unit. 

They  are 

1.  No  exponent  alignment  is  required  for  addition. 

2.  The  multiplier  is  a  subset  of  the  adder. 

3.  Independent  of  the  values  of  X  and  Y,   all   adds  will 
occupy  the  same  amount  of  time. 

2 
As  a  result,  both  a  fast  and  low  overhead  (FU)   unit 

can   become   a  reality  if  the  mapping  denoted  *  and  Y  can  be 

mechanized  efficiently  as  a   table   lookup   operation.    For 

r-2,   the   computed   address   v=b-a   can  be   presented  to  a 

preprogrammed  memory  unit  which  responds  with  the   value   of 

*( v )=log2 ( 1+2V ) .     In   order   to   be   consistent   with   the 

addressing  space  limitation  found  in  modern  memory  units,   v 

is   currently  limited  to  about  12  bits  of  precision.   Taylor 

was  able  to  extend  this  figure  using  a  linear   interpolation 

scheme    [86].     Lee   and   Edgar   also   introduced   a   data 

compression  policy  which  ignored  all   addresses  which  map, 

under  *,  into  zero  [79].   Along  a  similar  line  of  reasoning, 

it  can  be  argued  that  the  mapping  Y  and  Y~   can   also   be 

performed  as  a  table  lookup  operation. 

The  LNS  arithmetic  unit   contains   both   an   adder   and 

multiplier.    This   is   unlike  a  conventional  floating  point 

system  where  the  adder  and  multiplier   paths   are   separate. 


Nested   multiply  and  add  operations  can  be  interleaved  among 

2 
several  (FU)   units.   In  a  conventional  floating-point   unit 

with  distinct   multipliers  and  adders,  a  utilization  factor 
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as  low  as  50%  can  occur.   The  (FU)   units   can  be  made   to 
internally   switch   data   paths   so   as   to   support   both 


operations  (i.e.   100%  utilization).   This   can   be   a  very 

2 
important   property   if   the   (FU)   is  to  be  used  to  support 

systolic  or  other  data  path  determined  architectures. 


CHAPTER  FIVE 
ERROR  MODELS  FOR  LNS  ARITHMETIC 


The  implementation  of  filters  with  digital  devices 
having  finite  wordlength  introduces  unavoidable  quantization 
errors.  There  are  three  main  sources  of  quantization  error 
that  can  arise:  input  quantization,  coefficient 
quantization  and  quantization  in  arithmetic  operations.  The 
development  of  quantization  error  models  for  LNS  arithmetic 
will  be  presented  with  special  emphasis  on  input 
quantization.  Once  a  model  to  represent  input  quantization 
error  has  been  developed,  models  to  represent  the  other  two 
types  of  error  can  be  easily  be  obtained. 

Input  Quantization 

A  quantizer  can  be  viewed  as  a  nonlinear  mapping  from 
the  domain  of  continuous-amplitude  inputs  onto  one  of  a 
countable  number  of  possible  output  levels.  In  this  error 
analysis  the  input  is  presented  as  an  infinite  precision 
floating  point  number  and  it  is  mapped  to  LNS  with   the   use 
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of  a  floating  point  to  LNS  encoder,  illustrated  in  Figure 
(4-1).  The  only  source  of  error  in  forming  e  is  the  one 
resulting  from  the  logarithmic  mapping  m  -»  log  m  which  is 
performed  as  a  memory  table  lookup  operation.  The  proposed 
error  analysis  model  is  illustrated  in  Figure  (5-1).  Here, 
the  two  paths  associated  with  the  formation  of  Ax  and  its 
finite  wordlength  approximation  are  shown  as  parallel  paths. 
Upon  receiving  the  real  mantissa  m  ,  the  lower  path  provides 
the  ideal  mapping  Ax=log  m  while  the  upper  path  consists  of 
an  input  quantizer  (Q  ,  which  provides  a  discrete   value   of 

A  "k  it 

m   ),   an  ideal  mapping  of  m   into  L=log  m   and  finally  an 

X  X  L   X 

output  quantizer  (Q  ,  which  provides  the  machine  version  of 
L,  namely  Ax  »RND(L). 

The  input  and  output  quantization  errors  can  be  defined 
as 

EI=mx-mx*;    E0=logr (mx* )-[ logr ( mx* ) ]* 

where  E^  and  E_  are  uniform  white  noise  sequences  possessing 
the  following  statistical  properties: 

E[El]=E[Eo]  =  0;  E[EOkEOj]  =  (c3o/12)5kj 


E[mxEI]=E[(logr(mx*))Eo]=0;       E[  E^Ej ..  ]  =  (  q^/12  )  5R  . 

-Nj  -N 

Here,  qx=2    and  qQ=2    ,  where  Nj  and  NQ  are   the   numbers 

of  bits  available  at  the  input  and  output  of  the  logarithmic 
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►  E 


Ax 


Figure  5-1.   Input  Quantization  Error  Model. 
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mapping  L.  In  addition  we  assume  that  m  is  uniformly 
distributed  over  [l/r,l).  The  final  error  metric  E  is 
parameterized  by  D=E  /m  ,  as 

E  =  Ax  -Ax  =  logr(mx+EI)+E0-logrmx  =  logr(l+D)+E0 

Through  the  application  of  the  theory  of  functions  of  random 
variables  [90],  the  probability  density  function  (p.d.f.) 
fD(D)  is  shown  to  be  given  by 

00 

fD(D)=   J  |mx|  f      (Dmx,mx)  dmx ;   -q,  <  D  <  qj 


Direct  evaluation  of  fD(D)  results  in 


4DZ    4q. 


for  -qz    <  D  <  -qz/2 


fD(D)  = 


«q, 


for  -qx/2  <  D  <  qj/2 


4D     4q. 


for   qx/2  <  D  <  qj 


Defining  P=logr(l+D),  then  D=rP-l,  and  f  (P)  satisfies 


fp(P)' 


fD<D>   P 


|3P| 

I  3D|D 


-r   lnr  fD(D)=rr  lnr  f  ( r  -1 ) ;   lnr=log 
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Assuming  that  P  and  E_  are  statistically  independent  random 
variables,  the  p.d.f.  of  the  final  error  E-P+E_  can  be 
computed  directly  using 


f  (E)=   J  f  (P)  f   (E-P)  dP  (5.1) 


where  the   evaluation   of   (5.1)   depends   on   the   relative 
magnitude  of  the  input  and  output  quantization  step. 

Consider  the  case  where  the  input  quantizer  offers 
lower  precision  than  the  output  quantizer  can  provide  (i.e.: 
N  <NQ  or  q  >q  ) ;  then  fE(E)  is  given  by 


logr(l-qI/2)  logr(l+qI/2) 

J  A(P)  f   (E-P)  dP   +       J  B(P)  f, 
E0 

logr(l-qi)  logr(l-qi/2) 


f-(E)-      J  A(P)  f   (E-P)  dP   +      J  B(P)  f   (E-P)  dP  + 
E  E0  E0 


logr(l+qI) 

+     J  A(P)  f   (E-P)  dP 

""0  (5.2) 

logr(l+qi/2) 

where 

A(P)-lnr  rP( ^ 5-—);   B(P)  =  (3/4q_)  lnr  rP 

4<rF-ir  4q_  Z 


A  direct  evaluation  of  (5.2)  results  into  a  complicated 
piecewise  continuous  expression  consisting  of  seven 
subintervals.  The  final  error  density  function  having  a 
trapezoidal  shape  is  represented  by  the  solid  line  in  Figure 
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(5-2).  Two  distinct  first  order  approximations,  one  for 
each  side,  can  be  used  to  simplify  the  expression.  The 
resulting  approximated  p.d.f.  of  the  final  error  is  plotted 
in  Figure  (5-2)  outlined  by  the  dotted  line.  In  addition 
experimental  results,  outlined  by  the  dashed  line,  verify 
the  theoretical  analysis. 

Similar  results  can  be  achieved  considering  the  other 
case,  where  the  input  quantizer  offers  higher  precision  than 
the  output  quantizer  (i.e.:  nt>n0  or  3t<(W  '  Motivated  by 
the  fact  that  q_  is  the  dominating  factor,  an  investigation 
was  made  concerning  the  necessary  and  sufficient  conditions 
under  which  the  final  error  p.d.f.  coincides  with  the 
output  quantization  error  p.d.f.  with  a  minimum  error.  In 
order  words,  how  many  more  bits  of  precision  in  the  input 
quantizer  will  bring  the  two  errors  at  comparable  levels? 
As  a  basic  criterion  of  comparison  the  mean  square  error 
criterion  was  used. 

The  mean  square  error  of  the  output  quantizer  error   is 

2   2 
o0**qQ/12        and   let    e   represent   the   maximum  allowable 

percentage  error.   Then  a  relationship  between  the  number  of 

input   and   output  bits  can  be  found  to  satisfy  the  equation 


E[[(logr(mx*))*-logr(mx)]2]  -  E[ ( logr (mx* )-logr (mx) ]2 ]  -  ea2, 

2     2 
or   E[P  ]"taQ,      where   P   has   been  defined   previously  as 


P«logr(l+EI/mx) .   Using  (5.2),  then 
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Figure  5-2.   Final  Error  Probability  Density  Function. 
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P,p2,         .    ,(y2-yl'        ^iJlHj    ,    yl(y2-yl'i, 

E[P    ]    =    A    [ 5 3 +   g ]  + 

(x^-x.)         2x1(x1-x2)         x1(x1-x2) 

+  a  [ — g ^ +  3 1+ 

2  2 

+   r    [z1(lnz1)    -z2(lnz2)    -2z1lnz1+2z2lnz2+2z1-2z2) 

where 

A= ^ ;    2= 5 — ;    r= 5 

(Y2-y^  (x2_xl)  (lnr)z4qi 


y1-logr(l-qI);    x1=logr ( 1+qj ) ;    z1=(l+qI/2) 
y2-logr(l-qI/2);  x2=logr ( l+q:/2 ) ;  z2=(l-qI/2) 

The  resulting  theoretical  mean  square  error  of  the  final 
error  is  shown  in  Figure  (5-3),  designated  by  stars,  for  4, 
8,  12  output  bits.  In  addition,  experimental  results 
designated  by  circles  verify  the  theoretical  ones.  Finally, 
the  straight  lines  represent  the  mean  square  error  of  the 
uniform  output  quantization  error.  As  it  can  been  seen  from 
Figure  (5-3),  three  extra  bits  of  precision  in  the  input 
quantizer  are  required  to  make  the  final  error  density 
function  comparable  to  the  output  quantization  error  density 
function.  In  addition,  one  bit  of  precision  in  the  input 
quantizer  is  gained  since  the  leading  bit  in  the  mantissa  is 
always  one,  thus  reducing  the  number  of  extra  bits  to  two. 
Using  this  assumption  the  effect  of  Q   is  ignored   and   each 
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Figure  5-3.   Mean  Square  Error  of  Final  and  Output  Error 
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infinite   floating-point  measurement  can  be  interpreted  as  a 

LNS  word  with  a  fractional  exponent  as  follows: 
* 

z  =r  z;   e  =e  +  Ax  ;    Ax  =Ax+w;    Ax-log  (at  ) 
z   x  L   x 

where  w  is  a   uniform  white   sequence   with   the   following 
statistical  properties: 

E[wk]=0;   E[wkwj]=(q0/12)Skj;   E[AxRwk]=0 

The  simplified  input  quantization  model   is   illustrated   in 
Figure  ( 5-4 )  . 


Coefficient  Quantization 

ea 
Any  coefficient  a=r    is   represented   as   a   quantized 

coefficient   a   when   represented   in   a   finite  wordlength 

it 

register.   The  error  Ae  =e  -e   is   inherently  deterministic 

cl    d    cl 

in  nature.    Thus  we   model   the   quantized  coefficient  as 

e  =e  +Ae  .   It  should  be  noted  that  |Ae  l<q/2,   where   q   is 
a  a   a  a 

the  quantization  step. 


Quantization  in  Arithmetic  Operations 

We  assume  throughout  that  no  overflow  occurs  during  any 

LNS  arithmetic  operation.  Multiplication  does  not  introduce 

quantization   errors.  However,    during   addition   and 

subtraction  quantization  errors   are   unavoidable   due   to 
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memory  wordlength  limitations.  The  addition/subtraction 
quantization  process  can  be  adequately  represented  using  the 
model  in  Figure  (5-5).  Here  ex  and  ey  represent  the  finite 
wordlength  LNS  exponents  to  be  added.  Let  the  symbol  © 
represent  logarithmic  addition.  For  the  operation  of 
addition  the  following  sequence  of  events  takes  place.  For 
Z=X+Y  and  assuming  w.l.o.g.   that  ex>e  ,  it  follows  that 

e      e    e 

Z  =  r  z  =  r  X+r  * 

or 


e   =  e  ©e   =  e  +*(v);  *(v)=log  (1+r  );  v=e  -e 

The  quantized   addition   e    is   obtained   by   rounding   the 
fractional  part  of  e   and  can  be  modeled  as 

ez  =  (ex®ey)q  =  (ex®ey)+w  =  ( ex+w)®( ey+w) 

where  w  is  a  white  sequence  uncorrelated  with  ex  and  e  . 
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Figure  5-4.   Simplified  Input  Quantization  Model. 


ex  O 


e  O- 
y    ^ 


+v 


e 


_ 


Figure  5-5.   Addition  Quantization  Model 


CHAPTER  SIX 
LOGARITHMIC  KALMAN  FILTERS 


LNS  Kalman  Filters 

A  discrete  stochastic  dynamic  system  containing 
deterministic  inputs  can  be  represented  as 

xk+l  "  Fxk  +  uk  +  wk  ^-^ 

Zk  -  Hxk  +  vk 

Given  the  a  priori  estimate  of  the  initial  state  xQ  and 
the  state  covariance  pq(~)  anc*  given  the  a  priori 
statistical  information  of  the  input  noise  covariance  Q  and 
output  noise  covariance  R,  an  estimate  of  the  state  of  the 
system  defined  by  (6.1)  is  obtained  sequentially  for 
k-1,2,3...   for  the  standard  Kalman  filter 

Xk+1  =  Akxk  +  uk  +  Ckzk'   x0  <6-2> 


where 
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Ak=(F-FKRH);   Ck=FKR 

Further  A,  and  C,  are   assumed   to  be   precomputed  with   a 
high-precision   computer.    If   the  arithmetic  operations  in 

(6.2)    are    implemented    using  theoretically    ideal 
infinite-precision   arithmetic,    then   the   mean   and   the 

variance  of  the   estimation   error  are  well   known   to   be 
respectively, 

E[xR-xk]  =  0  (6.3) 


E[(xk-xk)(xk-xk)T]  «  Pk(-)  (6.4) 


The  LNS  implementation  of  the  Kalman  filter   (6.2)   can 
be  presented  as 

exk+1-(eAk©exk)©euk©(eCk«ezk) ,   exQ  (6.5) 

where  ©  and  ©  designate  LNS  matrix   multiplication   and   LNS 
vector  addition,  respectively. 


Theoretical  Error  Analysis  in  LNS 

We  consider  here  the  effect  of  LNS  roundoff  arithmetic 
since  in  a  digital  environment  real  numbers  must  be  replaced 
by  their  finite  wordlength  equivalent.  In  this  work,  the 
(N+2)-bit  sign  magnitude  LNS  data  format  described  in 
Chapter  Four  will  be  employed.    We  will   assume   that   the 
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number  of  integer  bits,  I,  is  properly  selected  so  no 
overflow  will  occur  during  various  arithmetic  operations. 
When  the  filter  (6.5)  is  implemented  on  a  finite-precision 
machine,  the  standard  performance  measures  (6.3)  and  (6.4) 
will  be  altered  because  of  input,  coefficient,  and 
arithmetic  quantizations.  We  predict  these  anticipated 
degradations  under  the  assumption  that  the  models 
representing  various  quantizations,  as  developed  in  Chapter 
Five,  are  correct. 

Let  ex,  .  denote  the  LNS  actual  estimate  of   the   state 

.  .  *    *       * 

x.  -  given  the  quantized  measurements  ezn,  ez. , . . . ,ez,  ;  this 

is   the   estimate   that   is    actually    computed    by    the 

finite-precision  implementation  of  (6.5).   By  replacing  each 

finite  precision  operation  in  (6.5)  by   a   quantizer   model, 

the   following   difference   equation   for   the   actual  state 

estimate  is  obtained. 

exk+1-[ (eAR8exk)  +eBk]©{ [euk+eAk+eBk]© 
©[(eCk«ezk)  +eAk+eBk]},  ex*Q 


ez*  «  ezk  +  ezk 

where  eAk  and  eBk  are  zero-mean  white  noise  sequences  of 
dimension  n,  which  represent  the  LNS  addition  error 
sequences.  Furthermore,  ezk  is  also  zero-mean  white  noise 
sequence   of   dimension  m,   which   represents   the   input 
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quantization  error.   The  effect   of   quantized   coefficients 
can  be  represented  as 

*  * 

eA,  =  eA,  +  AeA,  ;    eu.  =  eu,  +  Aeu, 


eCR  =  eCR  +  AeCk;    exQ  =  exQ  +  AexQ 


Now  define   the   error   between   the   actual   and   the 
theoretically  ideal  estimate  as 

ek  =  xk  -  xk  (6.6) 


where 


* 


x*    .    =    r      k  +  1    =    Yl,A*Y2,x*    +    Y3.u?    +    Y3.C^Y4.z.  (6.7) 

k+1  kkkk  kk  kkkk 


The  LNS  arithmetic  errors  are  represented   in   the   diagonal 
matrices  Yl,  ,  Y2k,  Y3k  of  dimension  nxn  and  Y4k  of  dimension 


mxm.   Specifically 


eb.|  i   eb~,        eb_, 
,;1     , .    ,    Ik     2k  nk . 

Ylk  =  diag  (r     ,r     ,....,  r     ) 

n7^  a      n-=l  j.  n-=l  j.  j. 

v-   ..   .  i=leAxlik   i=leAX2ik   i^2eAX3ik       eAn(n-l)k, 
Y2k=diag(r         ,r         ,r         ,....,r       '  ) 

eB,.+eA, ,   eBOI  +  eA-,,       eB  . +eA  . 
vi     *i*„    (r      lK    1k  ^   2k    2k      ,,   nk    nk . 
Y3k  =  diag  (  r         ,  r         ,  .  .  .  ,t  ) 

EZlk  +  iil  E^Z1  ik   eZ?k  +  i  «1  E<"Z?i  k 

Y4R=diag(r   ik  x    l        ilk,r  ZK    1-1    ^lK, 
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ez3k+i«2eCz3ik 


ezm+eAm(m-l)k. 

.  i  r  I 


Subtracting  (6.2)  from  (6.7)  and  use  of  (6.6)  yields 


ek+1  =  YlkA*Y2keR  +  [Y1UA*Y21,-A1,]x1,  + 


kH<   k  "kJ*k 


+  Y3kuk  "  uk  +  fY3kCkY4k-Ck)zk 


(6.8) 


We  define  an  auxiliary   augmented   state  Y^ER  such   that 

T    T  "  T   T 

y.  =(e,  ,x,  ,x.  )  .    Corresponding   augmentation  of  (6.8),  (6.2) 
and  (6.1)  yields 


*k  +  l  "  Gk*k  +  dk  +  fk'   *0 


(6.9) 


where 


Gk  " 


Yl.A*Y2.   Y1.aJy2.-A,    [  Y3.  C*Y4. -C.  ]H 
k  k   k     k  k   k   k 


0 
0 


k^k^k  ^k 

CkH 

F 


Y3kUk~Uk 


U, 


u, 


'Y3kckY4k-ck>vk 
ckvk 


w, 


Ax, 
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Mean  Error  Analysis 

From  the  stochastic  difference  (6.9)   for   yk,   we   can 
write  a  recursive  equation  for  the  mean  l.=E[ykJ  as  follows: 


1k+l  =  Vk  +  dk' 


(6.10) 


where 


Gk  = 


EYlkAREY2k   EYlkAkEY2k-AR   [ EY3kCREY4k-Ck ] H 
0  A,  CRH 


EY3kUk"Uk 


u. 


u. 


1«  - 


Ax, 


Furthermore 


EYlk  =  E[YlkJ  =  diag  (//,//, ,//) 


EY2,  = 


E[Y2R] 


EY3R  =  E[Y3k]  = 
EY4.  =  E[Y4,  ]  = 


, .         ,    n-1      n-1      n-2  , 

diag    {/j        ,/j        ,jj        ,....,//) 

2      2  2 

diag    (//    ,/j    ,....,//    ) 

, .         ,    m     m     m-1  2 . 

diag    {/j    ,/j    ,/u        ,  ....  ,/j    ) 


where 

e              rq/2-r~q/2  -F 

V   =   E[re]    =  -^ 1 ,         q=2    F 

qdnr) 
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Variance  Error  Analysis 

Computation  of  the  variance  is  more  complex   since   the 

matrix   G,   contains  random  variables.   Decomposition  of  the 

T 
covariance  matrix,  Pk=E[(yk~lk My^-^ )  1  and  state  matrix  Gk 

yields 


pk  =  ^Wi  -  hhT 


GR  =  G1R  +  G2k 


(6.11) 
(6.12) 


where 


Gl, 


-A, 


"CkH 
CkH 


G2k  = 


Yl.A*Y2,   Yl,A*Y2, 
k  k   k     k  k   k 


0 
0 


0 
0 


Y3,  C,  Y4,  H 
k  k   k 

0 
0 


From  the  stochastic  difference  (6.9)  for  yk ,  and  using 

(6.12)  we  can  write  a  recursive  equation  for  the  mean  square 

T 
error  Mk=E[ykyk  ]  as  follows: 


Mk+1  =  GlkMkGlkT  +  GlkMkG2kT  +  C1klkdkT  +  G2kMkGlkT  + 
+  E[G2kykykTG2kT]  +  E[G2RykdkT]  +  dklkGlkT  + 

+  E[dkykTG2kT]  +  E[dkdkT]  +  E[fkfkT],   MQ 
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where 


G2k    = 


EYlkAkEY2k 

0 

0 


EYlkAkEY2k 

0 
0 


Mo  - 


- 

0 

0 

0 

0 

0 

P0(-) 

EY3kCkEY4RH 

0 
0 


We  can  now  predict  the  mean  and  covariance  of  the 
actual  estimation  error  due  to  a  finite  precision  LNS 
implementation. 

Simulation  Studies 


The  purpose  of  this  section  is  to  investigate  how  well 
analytical  predictions  match  actual  experimental  results 
obtained  by  simulation,  and  to  compare  the  finite  precision 
conventional  Kalman  filter  and  the  logarithmic  Kalman  filter 
in  terms  of  speed  and  accuracy. 

For  the  purpose  of  comparisons  during  the  simulation 
studies,  the  double-precision  operations  of  a  VAX  750  with 
64  bits  are  taken  to  be  infinite-precision  operations. 

The  performance  of  the  Kalman  filter  (6.2)  for  a  fourth 
order  elliptic  lowpass  filter  was  investigated  for  both  LNS 
and  conventional  fixed  point  implementations.   In   order   to 
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avoid  overflow  operations  four  bits  were  used  to  form  the 
integer  field  of  a  N-bit  sign-magnitude,  fixed  point 
wordlength  format,  leaving  Fc=N-4-l  fractional  bits.  In  the 
N-bit  LNS  data  format  two  bits  were  used  to  cover  the  same 
integer  dynamic  range  resulting  in  F  -N-2-2  fractional  bits. 
A  Monte  Carlo  simulation  was  carried  out  with  10,000  sample 
paths  for  each  register-length  implementation  for  both  cases 
(i.e.  LNS  and  conventional).  The  average  degradation  in 
the  LNS  mean  E[e,  ]  and  the  LNS  covariance  cov[ek,ek]  of  the 
actual  estimate  can  be  compared  with  the  theoretical 
estimate  analytically  predicted  by  (6.9)  and  (6.11).  The 
results  of  the  simulation  and  the  analytical  prediction  for 
the  degradation  of  LNS  mean  are  shown  in  Figure  (6-1),  for 
8,  12,  and  16-bits  while  the  corresponding  LNS  covariance 
comparisons  are  shown  in  Figure  (6-2).  The  simulation 
results  are  indicated  with  marks  while  straight  lines  show 
the  analytical  predictions.  The  corresponding  results  for 
the  conventional  fixed  point  mean  and  covariance,  using 
Stripad's  error  model  [91],  are  illustrated  in  Figures  (6-3) 
and  (6-4). 

There  appears  to  be  good  agreement  between  the 
analytical  predictions  and  the  averaged  simulation  results. 
In  addition,  a  comparison  between  LNS  and  conventional  fixed 
point  results  demonstrates  the  superiority  of  the  LNS 
architecture  in  terms  of  accuracy,  as  illustrated  in  Figure 
(6-5).    The  straight  lines  represent  the  LNS  error  variance 
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Figure   6-1.      Plot   of   LNS   Error   Mean, 
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Figure  6-2.   Plot  of  LNS  Error  Covariance. 
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Figure  6-3.   Plot  of  Con.   Fixed  Point  Error  Mean. 
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Figure   6-4.      Plot   of   Con.      Fixed   Point   Error   Covariance. 
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Figure   6-5.      Comparison   between   Con.      Fixed   Point   and   LNS 
Error   Covariances. 


71 

and  marks  denote  the  conventional  fixed  point  error  variance 
for  8,  12  and  16  bits. 

Throughput  rates  of  the  Kalman  filter  (6.2)  can  be 
estimated  in  terms  of  the  system's  multiplication  time 
(TMULT)  and  add  time  (TADD).  Specifically,  the  execution 
time  for  a  n-dimensional  dynamic  system  with  a  measurement 
vector  of  order  m  is 

T  -=  n[(n+m) TMULT  +  (n+m)TADD]  (6.13) 

Using  the  Schottkey  TTL  (2500LS)  design  as  a  reference,  and 
2500LS  parts  with  fixed  point  addition  (TA)  37ns  plus  a  4K 
35ns  ROM  parameters  (TM),  then  TADD=2TA+TM=109nsand 
TMULT=TA=37ns  for  LNS  arithmetic.  Using  the  same  technology 
for  conventional  fixed  point  arithmetic  addition  and 
multiplication  can  be  performed  in  37  nsec  and  150  nsec 
respectively.  It  can  be  noted  that  the  multiplication 
operations  in  an  LNS  architecture  are  comparable  to 
conventional  fixed  point  addition.  However,  there  is  an 
advantage  of  LNS  addition  over  fixed  point  multiplication 
and  considering  the  fact  that  there  are  as  many 
multiplications  as  additions  in  (6.13),  the  LNS  superiority 
becomes  apparent.  If  pipeline  configuration  is  used,  where 
throughput  is  a  function  of  the  slowest  process  element,  the 
throughput  rate  for  LNS  is  37ns  and  the  throughput  rate  for 
fixed  point  is  150ns. 


CHAPTER  SEVEN 
ADAPTIVE  KALMAN  FILTERS 


The  Effect  of  Erroneous  Models 
on  the  Kalman  Filter  Response 

A  well-known  limitation  in  the  application  of  the 
Kalman-Bucy  filter  to  real-world  problems  is  the  assumption 
of  known  a  priori  statistics  for  the  stochastic  errors  in 
both  the  state  and  observation  process.  Wrong  statistics 
made  a  priori  can  lead  to  a  "wrong"  Kalman  filter.  A 
derivation  of  the  "wrong"  Kalman  filter  is  possible, 
assuming  wrong  a  priori  statistics. 

Given  an   a  priori   estimate   of   the   "wrong"   state 

* 
covariance   P~   and   given   the  "wrong"  a  priori  statistical 

information  of  the  input  (source)  noise   covariance   QI   and 

output  (measurement)  noise  covariance  RI ,  a  suboptimal  state 

of  the  system  defined  by  (6.1)  can  be  obtained   sequentially 

for  k=l,2,3...   similar  to  the  standard  Kalman  filter  having 


72 


73 


a  state  estimation 


xk+l  "  F(I"KkH)xk  +  uk  +  FKkzk  {1'l) 


an  error  covariance 


*      ,    *  v  *  T 


Pk+1  =  F(I-KRH)PkF   +  QI  (7.2) 

and  a  Kalman  Gain  Matrix 

**T,*T,-1  i  -i     -3  \ 

KR  =  PkHi[HPkHi+RI]  A  (7.3) 

Observe  that  for  an  optimal  Kalman  filter  studied  under  the 
case  that  QI=Q  and  RI=R,  then  Pk=Pk  is  the  covariance  of  the 
error  in  estimating  the  state.  Now  assume  that  the  actual 
gains  employed  in  the  filter  design  deviate  from  the  optimal 
gains ,  i.e.,  that 

Kk  -  Kk  +  &Kk 

In  order  to  determine  the  effect  of  these  errors  we  ask  the 
following  question.  How  close  do  we  come  to  the  optimum  by 
using  the  suboptimal  filter  equation  (7.1).  The  answer  is 
given  by  the  "actual"  covariance  matrix  of  the  estimation 
error  obtained  when  (7.1)  is  used  instead  of  (6.2).  The 
"actual"  covariance  matrix  denoted  by  PAk  indicates  the 
quality  of  the  suboptimal  estimate  and  is  defined  as 

PAk+l  -  E[(xk+l"xk+l)(xk+l-xk+l)T] 


74 

Considering  the  error  in  the  suboptimal  case  and  using 
both  the  system  model  (6.1)  and  suboptimal  state  estimation 
(7.1),  we  obtain  the  recursive  equation 

(xk  +  l-Xk+l>  "  (F(I-<H)Uk-V  +  Wk  +KkVk 
and  after  some  manipulation  we  have 

PAk+l  =  F(I-K*H)PAk(I-K*H)TFT  +  Q  +  FK*RK*TFT 

It  should  be  noted  that  the  "actual"  steady  state  error 
covariance  PA  depends  upon  both  the  a  priori  statistics  of 
the  system  model  (i.e.,  "wrong"  Kalman  gain,  K,  )  and  the 
true  noise  statistics  R  and  Q. 

In  order  to  quantify  the  difference  between  the 
"actual,"  optimal  and  "wrong"  steady  state  error  covariances 
consider  the  single-input,  single-output  model  (i.e.,  m~l 
and  wk  a  scalar  time  series)  where  Q  is  a  diagonal  matrix 
with  only  one  non-zero  entry  element  and  R  is  a  scalar 
matrix.  Figures  (6-1),  (6-2),  and  (6-3)  illustrate  these 
differences.  Here,  each  iteration  represents  the  Kalman 
filtering  of  a  256  point  time  series  with  R=RI  and  QI 
changing.  The  value  of  the  true  noise  statistics  R  and  Q 
are  shown  in  the  upper  right  corner  of  the  figures.  The 
trace  of  the  steady  state  error  covariances  are  plotted  for 
each  iteration  with  the  ratio  (RI/QI)  distributed 
logarithmically  from  .01  to  100.  The  straight  line 
represents   the   optimal  steady  state  error  covariance.   The 
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triangles  represent  the  "wrong"  steady  state  error 
covariance  while  the  stars  represent  the  "actual"  steady 
state  error  covariance  based  on  the  "wrong"  Kalman  gains. 
We  can  observe  that  the  crossover  point  of  these  curves  is 
the  optimal  case  in  which  we  obtain  the  optimal  Kalman  gain. 
In  addition,  the  "actual"  error  covariance  is  increasing 
after  the  crossover  point.  It  can  be  explained  by  the 
"wrong"  Kalman  filter  gain  which  is  decreasing  in  each 
iteration,  and  as  a  result  the  innovation  error  is 
magnified. 


The  Effect  of  Input  and  Output  Noise  Covariances 
on  the  Kalman  Gains 

The  performance  of  the  Kalman  filter  depends  on  the 
choice  of  the  a  priori  noise  statistics  RI  and  QI  which  in 
turn  define  the  optimal  steady  state  Kalman  gain.  A  proof 
will  be  presented  showing  that  in  order  to  obtain  the 
optimal  Kalman  gain  the  ratio  of  the  a  priori  noise 
statistics  (RI/QI)  must  be  equal  to  the  ratio  of  the  true 
noise  statistics  (R/Q). 

The  steady  state  optimal  Kalman  gain  and  the  steady 
state  optimal  error  covariance  are  given  by 

K  =  PHT[HPHT+R]~1  (7.4) 
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Figure   7-1.      Error   Covariances    for   R«0.01   and  Q-0.01 
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Figure  7-2.   Error  Covariances  for  R-0.1  and  Q-0.01 
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Figure  7-3.   Error  Covariances  for  R-0.01  and  Q-0.1 
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P  =  F[P-PHT(HPHT+R)-1HP]FT  +  Q  (7.5) 


while  the  suboptimal  steady  state  Kalman  gain  and  suboptimal 
steady  state  error  covariances  are  shown  to  be 

K   =  P  H  [HP  H  +RI]  (7.6) 


P*  =  F[P*-P*HT(HP*HT+RI)  1HP*]FT  +  QI  (7.7) 


Let  the  true  noise  statistics  be  equal  to  the  ratio   of   the 
"wrong"  noise  statistics,  or 

R  =  a  RI  (7.8) 

Q  =  a  QI  (7.9) 

Direct  substitution  of  (7.8)  and  (7.9)  in  (7.5)  yields 

P  =  F[P-PHT(HPHT+aRI)-1HP]FT  +  aQI 

or 

QI  =  -F(P/a)FT  +  FPHT(HPHT+aRI)_1H(P/a)FT  +  ( P/a )     (7.10) 

Solving  (7.7)  for  QI  and  equating  the  result   to   (7.10)   we 
have 
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0  -   -F[(P/a)-P*]FT  +  [(p/a)-P*]  + 

+   F[PHT(HPHT+aRI)~1H(P/a)-P*HT(HP*HT+RI)"1HP*]FT 
The  solution  of  the  above  equation  is 
P   -  (P/a)   or   P  -=  a  P  (7.11) 


Direct  substitution  of  (7.11)  and  (7.8)  in  (7.4)  results 
K  -  aP  ir[a(HP  h1+ri)  ]  -1 

*  T.   *  T     -1 
=  P  HA[HP  H  +RI]  X 

* 
=  K 

Therefore  the   optimal   steady   state   Kalman   gain   can  be 

defined   in   terms   of   the   ratio   of   the   a   priori  noise 

covariances   (Ri/Qi).    in  other  words  knowledge  of  the 

correct   ratio   of   the  noise  covariances  yields  the  optimal 

steady  state  Kalman  gain.   This  is   illustrated   in   Figures 

(7-4),   (7-5)   and   (7-6).   The  straight  line  represents  the 

optimal  steady  state  Kalman  gain.   The   triangles   represent 

the   "wrong"   steady   state   Kalman   gain  while   the   stars 

represent  the   "actual"   steady   state   Kalman   gain   to  be 

defined  in  the  next  section.   We   can  observe  that  the 

crossover  point  of  these  curves  is  the  optimal  case   and   at 

that   point   the   ratio   of  the  a  priori  noise  statistics  is 

equal  to  the  ratio  of  the  true  noise  statistics. 
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Figure  7-4.   Kalman  Gains  for  R=0.01  and  Q-0.01 
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Figure  7-5.   Kalman  Gains  for  R-0.1  and  Q-0.01 
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Adaptive  Kalman  Filtering 

We  have  seen  that  to  yield  optimal  performance  for  a 
Kalman  filter  it  is  necessary  to  provide  the  correct  a 
priori  descriptions  of  F,  H,  Q,  R  and  P0.  As  a  practical 
fact,  this  is  usually  impossible;  guesses  of  these 
quantities  must  be  advanced.  Hopefully,  the  filter  design 
will  be  such  that  the  penalty  for  misguesses  is  small.  But 
we  may  raise  an  interesting  question;  i.e.,  "Is  it  possible 
to  deduce  non-optimal  behavior  during  operation  and  thus 
improve  the  quality  af  a  priori  information?"  Within  certain 
limits,  the  answer  is  yes.  Using  the  innovations  property 
of  the  Kalman  filter  an  adaptive  algorithm  can  be  designed. 

The  innovations  error  process  of  a  Kalman  filter  is 
defined  as 

vk  -  zk  "  H*k 

The  innovations  property  states  that,  if  K  is  the  optimal 
gain, 

E[vkvk-j]  =  0;    j  "  ° 

In  other  words  the  innovations  process,  \>,  is  a  white  noise 
process.  Heuristically ,  there  is  no  "information"  left  in 
v^,  if  x,  is  an  optimal  estimate. 

Let  the  autocorrelation  function,  for  a  lag  variable  k, 
be  denoted  by  A(k).  Then  the  first  two  autocorrelation 
measures  of  the  innovation  error  process  are 
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A(0)  =  EfvkVk]  =  H  pA  HT  +  R  (7.12) 


A(l)  =  E[vk\>k_:[]  =  HF[PAHT  -  K*(HPAHT+R)] 

=  HF[KA  -  K  ][HPAHT+R]  (7.13) 


where  KA  is  the  steady  state  Kalman   gain   corresponding   to 
the  "actual"  error  covariance.   That  is 

KA  -  PAHT[HPAHT+R]_:L  (7.14) 


Note  that  when  optimally  configured,  the  optimal  Kalman  gain 
K  is  given  by  KA  in  Equation  (7.14)  and  A(l),  in  Equation 
(7.13),  diminishes  to  zero.  This  well  known  condition  of 
the  innovations  error  being  "white"  will  be  used  as  an 
adaptive  filter  design  objective.  Furthermore,  this 
observation  suggests  that  if  the  ratio  of  the  a  priori 
information,  namely  (RI/QI),  is  equal  to  the  ratio  of  the 
true  noise  statistics,  namely  (R/Q),  then  A(l)  is  always 
zero.  As  the  ratio  varies  up  or  down,  A(l)  moves  positively 
or  negatively  in  harmony.  Specifically  if  (R/Q) >(RI/QI ) , 
the  computed  Kalman  gain  is  greater  than  its  "actual" 
counterpart,  as  it  can  been  seen  in  Figures  (7-4),  (7-5)  and 
(7-6),  and  forces  A(l)  to  be  negative.  In  the  other  case 
where  (RI/QI ) >(R/Q) ,  the  results  are  opposite.  This  is 
illustrated  in  Figures  (7-7),  (7-8)   and   (7-9).    Here   the 
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second  term  of  the  autocorrelated  innovations  error  process, 
namely  A(l),  is  plotted  as  a  function  of  the  ratio  RI/QI . 
In  this  figure,  the  theoretical  results  are  indicated  with  a 
straight  line  while  the  stars  denote  the  experimental 
results . 


Algorithm  Implementation 

To  implement  the  proposed  algorithm,  several  key 
functional  units  need  to  implemented  and  they  are 

1.  A  shift-invariant  Kalman  filter. 

2.  An  autocorrelator . 

3.  Adaptive  search  and  decision  rule. 

They  will  be  briefly  developed  as  follows: 

Kalman  Filter 

It  is  desired  to  realize  a  given  four-tuple  (F,H,QI,RI) 
description  of  the  nth  order  SISO  Kalman  filter  as  a  high 
throughput  filter.  Fast  systolic  and  SIMO  architectures 
have  been  proposed  to  achieve  this  goal  [92].  Pipelining 
can  also  be  used  to  increase  the  effective  throughput  of  an 
implemented  Kalman  filter.  These  methods  will  be  presented 
in  Chapter  8.  In  addition  using  LNS  Kalman  filters  higher 
throughput  and  accuracy  can  be  obtained. 


87 


i 
i 

! 

i 

2 

1  - 

0  5 

o.c 

-0.5 

- 

2 

: 

-< 

* 

g 

-L.0 

\ 

•4 
k      -'      * 

* 

1— - 
I—- 

1 — 1 

< 

—  -co 

1 

* 

< 

-* 

-5.5 

: 

j  i  m  1 1  i  1       ;    i  i  ii  1 1 1 1       i    i  i  i  1 1 1 1 1       I    i  i  1 1 1 1 1 

10"' 

10"               to0                to1 

RATIO     (RI/QI) 

R=0.01 
Q=0.01 


Figure  7-7.   Autocorrelation  of  Innovation  Process 
for  R-0.01  and  Q=0.01 


88 


4  - 

* 

: 

z 

n 

^         .  •  - 

»                 * 

*                                -" 

•■■ 

c 

1.3'" 

x 

^ 

^ 

< 
> 

-4  - 

'-< 

o 

i 

-9  - 

/ 

CL 

*./ 

.' 

o 

z 

o 

-12  - 

*.  / 

: 

E-« 

* 

f 

i 

3 

-16  - 

*  / 
/     * 

o 

'**  / 

/ 

o 

o 

-20  - 

/ 

1 

E- 

/ 

D 

/ 

«* 

-24  - 
-28  - 

1 

j 

i 

I 
i 

<JC 

1 

1    1  1  1 1  III         1     1    1  1  1  1  III         I     Mill 

ll|        , 

i    II  1  lllj 

1 

o- 

ID"1                    10° 

RATEO    (RI/QI) 

I0l 

1 

1 
! 

R=0.  1 
Q»0.0 


Figure  7-8 


Autocorrelation  of  Innovation  Process 
for  R-0.1  and  Q-0.01 


89 


!R=0.ei 

Q=B.1 


4  - 


;  2 

!  z 

i 

i 
i 

*• 

i 
j 

i  ^ 

#.--3C 

2 

1 

o 

o 

£- 

< 

1  - 

0  - 

-1  - 

i 

/  *  * 

y 
/ 

* 

/  * 

■■ 

1  *  *  / 

-2  - 

/ 

* 

O 

I 

1 1 1 1 1 hi   i  1 1 1 1 1 hi    i  ii 

i  1 1  in        i 

1    1  1  III 

10"' 

10"'                   10° 

101 

RATIO     (RI/Ql) 

Figure  7-9.   Autocorrelation  of  Innovation  Process 
for  R-0.01  and  Q-0.1 


90 


Autocorrelation 

A  discrete  autocorrelation  function  is  given  by 

N 
r(k)  =  Z     x(n)  x(n+k) 
n-1 

If  r(k)  is  desired  for  a  range  of  k  on  the  order  N/2,  then 
using  the  FFT  to  compute  DFT  [x(f)x  (f)]=r(k)  is 
computationally  efficient.  For  relatively  short  delay 
analysis  (i.e.,  k<<N),  the  Pfeifer/Blankenship  (BP) 
algorithm  is  optimum  from  a  computationally  complexity 
viewpoint.  The  BP  version  of  the  autocorrelation  algorithm 
satisfies 

p-1   k 
r(k)  =  Z        Z      x(2jk+i+k)  [x(2jk+i)  +  x(2jk+i+2k)] 
j=0  i=l 

k  =  1  2,  ...,  p;    p=INT(N/2k) 

For  N>>p,  the  BP  algorithm  essentially  halves  the  number  of 
multiplications  normally  associated  with  direct  computation 
and  can  also  be  considerably  less  than  the  associated  with 
FFT  mechanizations. 

The  speed  of  the  BP  algorithm  can  be  further  improved 
by  eliminating  the  time  consumed  to  compute  data  invariant 
indices  (i.e.,  (2jk+i+k),  (2jk+i),  (2jk+i+2k)).  This  can  be 
achieved  through  the  use  of  so  called  inline  code  or  knotted 
code  [93].  Therefore,  in  the  context  of  the  proposed 
adaptive   filter,  it  is  apparent  that  the  computation  of  the 
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autocorrelation   function   can  be   performed   at   real-time 
speeds  (i.e.,  t( autocorrelation)  <  t(Kalman  filter)). 

Another  alternative  to  the  BP  algorithm  is  the 
dedicated  hardware  unit,  illustrated  in  Figure  (7-10).  It 
consists  of  two  multiply-add  modules  in  parallel 
configuration.  Each  module  calculates  each  of  the 
autocorrelation  coefficients,  namely  A(0)  and  A(l),  at  real 
time  speed. 

Search  Algorithm 

In  order  to  efficiently  estimate  the  ratio  R/Q,  a 
modified  Fibonacci  search  has  been  developed,  as  illustrated 
in  Figure  (7-11).  Letting  f(x)  denote  a  real-valued 
function  of  a  single  real  variable  x,  assume  that  f(x)  has  a 
invariant  minimum  at  x  in  some  real  interval  [a*^*].  The 
purpose  of  the  search  algorithm  is  to  sequentially  reduce 
the  search  initial  interval  to  [ an/n) < [ an_1 #bn_1 ]  which  also 
contains  x  after  the  nth  iteration.  In  the  proposed 
algorithm,  the  points  x,  through  x,-,  are  chosen  to  divide 
[an,bn]  into  four  subintervals  of  equal  length  and  then 
compute  f(aQ)  and  f(bQ).  If  | f ( aQ ) | > | f (bQ ) | ,  in  the  next 
iteration  ai~x2  an<*  l3i=bn"  Otherwise,  a-^ag  and  b^=x4«  As 
a  result,  the  new  interval  has  been  reduced  by  one 
subinterval  so  that  after  two  iterations  the  interval 
[aQ,bQ]  has  been  halved.  This  process  is  repeated  until  the 
prespecified  interval  of  uncertainty  is  reached. 
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Figure  7-10.   Hardware  Autocorrelation  Unit. 
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Figure  7-11.   Modified  Fibonacci  Search  Algorithm. 
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Algorithm  Integration 

The  proposed  algorithm  consists  of  two  Kalman  filters 
"running"  in  parallel,  Figure  (7-12).  During  the  first 
iteration,  the  first  term  of  the  autocorrelation,  namely 
A(0),  is  used  to  estimate  R.  Then  using  as  RI  the  value  of 
A(0),  the  two  Kalman  filters  commence  o  modified  Fibonacci 
search  at  the  assumed  upper  and  lower  limit  of  the  ratio 
(RI/QI)  respectively.  After  each  iteration  A(l)  is 
calculated  using  the  fast  BP  algorithm  for  each  of  the 
Kalman  filters  as  a  function  of  the  ratio  (RI/QI).  This 
data  is  presented  to  the  search  algorithm  and  an  update 
estimate  of  QI  is  determined  for  each  of  the  Kalman  filters. 
After  the  interval  on  uncertainty  of  the  ratio  is  reached, 
the  estimated  ratio  is  accepted  as  the  best  estimate  of  this 
metric.  The  parameter  R  is  the  calculated  using  Equation 
(7.12).  Finally,  from  estimated  ratio  metric  and  R,  Q  can 
be  computed. 

Mehra's  Adaptive  Algorithm 

A  method  was  developed  by  Mehra  in  1970  in  order  to 
identify  the  input  and  output  noise  covariances  in  Kalman 
filters.  First,  a  correlation  test  is  performed  on  the 
filter  to  check  whether  it  is  working  optimally  or  not.  The 
test  is  based  on  the  innovation  property  of  an  optimal 
filter.    In  order  to  perform  the  test,  the  autocorrelation 
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function  of  the  innovations  process  A(k)  is  calculated. 
Then  estimates  of  the  normalized  autocorrelation 
coefficients,  p.  ,  are  obtained  by  dividing  the  elements  of 
A(k)  by  the  appropriate  of  A(0),  e.g., 


[pk]ij 


[A(k)J, 


11 


{[A(0)..][A(0)jj]} 


172 


Then  look  at  a  set  of  values  for  [p.]..,  k>0  and   check   the 

1/2 
number   of   times   they   lie   outside  the  band  +(1.96/N  '  ), 

where  N  is  the  sample  points.   If  this  number  is  less  than  5 

percent  of  the  total  the  sequence  v,  is  white. 

If  the  test  reveals  that  the  filter  is  suboptimal,  the 
next  step  will  be  to  obtain  better  estimates  of  Q  and  R. 
This  can  be  done  using  A(k)  computed  earlier. 

First  obtain  an  estimate  of  PAH  ,  using  the  following 
equation: 


PAHT  =  K*A(0)  +  r# 


A(0) 
A(l) 


A(n) 


where 


r*  -  (rTr)  1rT 


and 
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HF 


HF( I-K  H)F 


H[F(I-K*H) ]n  XF 


Then  an  estimate  of  R  is  obtained  using  (7.12) 

R  =  A(0)  +  H  PA  HT 

Finally  we  obtain  an  estimate  of  Q  using  the  "actual"  steady 
state  error  covariance  estimate 


PA  =  F(I-K*H)PA( I-K*H)TFT  +  Q  +  FK*RK*TFT 


Based  on  the  new  estimates  of  R  and  Q  a  new  estimate  for  the 
Kalman  gain  can  be  obtained.  Again  the  whole  procedure  can 
be  repeated  until  the  correlation  test  is  successful. 


Carew's  and  Balanger's  Adaptive  Algorithm 

A  discrete  linear  stationary  system  is  considered  for 
which  the  input  noise  covariance  Q  and  the  output  noise 
covariance  R  are  unknown.  A  stable  filter  with  a  suboptimal 
gain  is  assumed.  The  identification  scheme  uses  the 
autocorrelation  functions  of  the  innovations  sequence  of  the 
suboptimal  filter  to  determine  the  optimum  filter  steady 
state  gain  K  directly  without  the  intermediate  determination 
of  the  unknown  covariances  Q  and  R.   The  approach  used  is  to 
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identify  an  output  equivalent  representation  of  the  original 
system  which  does  not  involve  the  unknown  covariances 
directly.  The  identification  scheme  starts  with  correlation 
measurements  and  uses  an  iterative  algorithm  which  is  a 
contraction  mapping  under  the  mild  assumption  of  complete 
controllability  and  observability  of  the  given  system. 

The   algorithm   uses    the    zeroth   term   of    the 
autocorrelation   function   of  the  innovations  process  for  an 

optimum  filter  denoted  by  W  and  the  suboptimal  steady   state 

I 
error  covariance  P  which  are  defined  by 

A    T 

W  =  HPH   +  R 

P#  *  Et(;k  +  l-;k  +  l)^k  +  l-;k  +  l)T] 

In  this  algorithm  first  an  estimate  of  W  is   obtained   using 

#       * 
as  first  estimate  of  P   the  P   as  follows: 

W  =  A(0)  -  HP#HT  (7.15) 


Then  an  estimate  of  the  Kalman  gain  can  be  obtained  using 
K  =  (B#C  -  P#HT)  W_1  (7.16) 


where 


BT=[HT       FTHT       ....        (fVV] 
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C  = 


A(1)+HFK  A(0) 
A(2)+HFK*A(1)+HF2K*A(0) 


A(n)+HFK*A(n-l)+ +HFnK*A(0) 


Finally  an  updated  estimate  of  the  suboptimal   steady  state 
error  covariance  is  calculated 


P*  =  F(I-K*H)P#(I-K*H)TFT  +  F(K*-K)W(K*-K)T 


(7.17) 


The  iteration  scheme  ( 7  .15 )-( 7 . 17 )  converges  uniquely  to  the 
optimal  gain  if  the  autocorrelation  functions  A(j)  are 
accurately  known. 


Experimental  Results 

Three  fourth  order  elliptic  lowpass  filters  were 
simulated  on  a  VAX  750  computer.  For  each  filter, 
thirty-six  different  input  and  output  covariances  cases  were 
tested  over  a  ratio  range  from  0.02  to  50.  Estimates  of 
their  noise  covariances  were  compared  with  results  of 
Mehra's  [45]  and  Carew's  and  Balanger's  [50]  algorithms  with 
respect  to  accuracy  and  timing.  After  estimating  the 
unknown  covariances  and  the  resulting  Kalman  gain  for  each 
of  the  algorithms,  their  performance  was  compared  to  the 
optimal   Kalman   filter  in  a  mean  square  error  sense.   These 
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results  are  illustrated  in  Figures  (7-13),  (7-14),  and 
(7-15).  The  lines  with  dots  found  in  Figures  (7-14)  and 
(7-15)  represent  the  case  where  Mehra's  and  Carew's 
algorithms  did  not  converge.  The  estimation  time  of  the 
unknown  covariances  for  each  case  is  illustrated  in  Figures 
(7-16),  7(-17)  and  (7-18).  The  estimation  time  was  based  on 
the  number  of  additions  and  multiplications  required  in  each 
algorithm  assuming  precomputed  Kalman  gains.  For  example, 
if  the  Weitek  1032  and  1033  32-bit  floating  point  ship  set 
was  used  then  tADD=tMULT=910  ns-  :t  nas  been  observed  from 
the  conducted  experiments  that  the  optimallity  test  used  in 
Mehra's  algorithm  was  not  always  consistent  in  judging  the 
accuracy  of  the  noise  statistics,  and  as  a  result  some 
errors  occured  as  illustrated  in  Figure  (7-14).  In  addition 
the  number  of  iterations  needed  in  identifying  the  noise 
statistics  in  Mehra's  algorithm  is  not  known  a  priori.  This 
makes  real-time  identification  almost  impossible.  In  the 
proposed  algorithm  based  on  a  modified  Fibonacci  search,  the 
number  of  required  iterations  are  known  in  advance.  This 
combined  with  af  fast  autocorrelation  algorithm  means 
real-time  throughput  can  be  achieved.  On  the  other  hand, 
Carew's  algorithm  was  very  sensitive  to  the  calculated 
coefficients  of  the  autocorrelation  function  of  the 
innovations  process  and  as  a  result  many  errors  occured  as 
illustrated  in  Figure  (7-15).  Since  Kalman  filtering  and 
autocorrelation  estimation  was   required  inly  on  the  first 
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iteration,  Carew's  algorithm  was  slightly   faster   than   the 
proposed  scheme. 
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Figure  7-13.   Mean  Square  Error  of  Adaptive  Kalman  Filter. 
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CHAPTER  EIGHT 
IMPLEMENTATION  OF  KALMAN  FILTERS  USING 
SYSTOLIC  ARCHITECTURES 


Systolic  Architectures 

Computational  complexity  has  traditionally  been  an 
important  measure  of  the  utility  of  digital  signal 
processing  and  matrix  arithmetic  algorithms;  however,  the 
digital  implementation  of  such  algorithms  has  often  been 
difficult  (or  impossible)  since  they  tend  to  the 
compute-bound.  Consequently,  the  quest  for  real-time 
processing  has  resulted  in  a  concerted  push  for  the 
development  of  faster  computing  structures  as  well  as 
algorithms  of  lower  computational  complexity. 

With  the  advent  of  VLSI,  many  processing  elements  can 
now  be  realized  on  a  single  chip  and  as  hardware  cost  and 
size  continue  to  drop  and  processing  requirements  become 
well-understood  in  areas  such  as  signal  and  image 
processing,  more  special-purpose  systems  are  being 
constructed.   However,  since  most  of  these  systems  are  built 
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on  an  ad  hoc  basis  for  specific  tasks,  methodological  work 
in  this  area  is  rare. 

In  order  to  correct  this  ad  hoc  approach  the  concept  of 
systolic  architecture  was  introduced  [92].  Specifically,  a 
systolic  architecture  is  a  methodology  for  mapping 
high-level  computations  into  hardware  structures.  In  a 
systolic  system,  data  flows  from  the  computer  memory  in  a 
rhythmic  fashion,  passing  through  many  processing  elements 
before  it  returns  to  memory,  much  as  blood  circulates  to  and 
from  the  heart.  The  system  works  like  an  automobile 
assembly  line  where  different  people  work  on  the  same  car  at 
different  times  and  many  cars  are  assembled  simultaneously. 
An  assembly  line  is  always  linear,  however,  and  systolic 
systems  are  sometimes  two-dimensional.  They  can  be 
rectangular,  triangular,  or  hexagonal  to  make  use  of  higher 
degrees  of  parallelism.  Moreover,  to  implement  a  variety  of 
computations,  data  flow  in  a  systolic  system  may  be  at 
multiple  speeds  in  multiple  directions,  both  inputs  and 
(partial)  results  flow,  whereas  only  results  flow  in 
classical  pipelined  systems.  Generally  speaking,  a  systolic 
system  in  easy  to  implement  because  of  its  regularity  and 
easy  to  reconfigure  (to  meet  various  outside  constraints) 
because  of  its  modularity. 

A  systolic  system  consists  of  a  set  of  interconnected 
cells,  each  capable  of  performing  some  simple  operation. 
Because  simple,  regular  communication  and  control  structures 
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have  substantial  advantages  over  complicated  ones  in  design 
and  implementation,  cells  in  a  systolic  systems  are 
typically  interconnected  to  form  s  systolic  array  or  a 
systolic  tree.  Information  in  a  systolic  system  flows 
between  cells  in  a  pipelined  fashion,  and  communication  with 
the  outside  world  occurs  only  at  the  "boundary  cells."  For 
example,  in  a  systolic  array,  only  those  cells  on  the  array 
boundaries  may  be  I/O  ports  for  the  system. 

Computational  tasks  can  be  conceptually  classified  into 
two  families,  compute-bound  computations  and  I/O-bound 
computations.  In  a  computation,  if  the  total  number  of 
operations  is  larger  than  the  total  number  of  input  and 
output  elements,  then  the  computation  is  compute-bound, 
otherwise  it  is  I/O-bound. 

Systolic  architectures  can  be  used  to  speed  up 
compute-bound  computation  in  a  relative  simple  and 
inexpensive  manner.  The  basic  principle  of  a  systolic 
architecture,  a  systolic  array  in  particular,  is  illustrated 
in  Figure  (8-1).  By  replacing  a  single  processing  element 
with  an  array  of  PEs,  or  cells  a  higher  computation 
throughput  can  be  achieved  without  increasing  memory 
bandwidth.  The  function  of  the  memory  in  the  diagram  is 
analogous  to  that  of  a  heart;  it  "pulses"  data  (instead  of 
blood)  through  the  array  of  cells.  The  crux  of  this 
approach  is  to  ensure  that  once  a  data  item  is  brought  out 
from  the   memory  it  can  be  used  effectively  at  each  cell  it 
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Figure   8-1.  Basic   Principle   of   a   Systolic   System. 
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passes  while  being  "pumped"  from  cell  to  cell  along  the 
array.  This  is  possible  for  a  wide  class  of  compute-bound 
computations  where  multiple  operations  are  performed  on  each 
data  item  in  a  repetitive  manner. 

Being  able  to  use  each  input  data  item  a  number  of 
times  (and  thus  achieve  high  computational  throughput  with 
only  modest  memory  bandwidth)  is  just  one  of  the  many 
advantages  of  the  systolic  approach.  Other  advantages  are 
modular  expansibility,  simple  and  regular  data  and  control 
flows,  use  of  simple  and  uniform  cells,  elimination  of 
global  broadcasting,  and  fast  response  time. 

As   an   example   the    convolution   problem   can   be 
considered.    We  consider  the  convolution  problem  because  it 
is  a  simple  problem  with  a  variety  of  enlightening   systolic 
solutions,   because   it   is  an  important  problem  in  it's  own 
right,  and  more  importantly,  because  it  is  representative  of 
a  wide   class   of   computations  suited  to  systolic  designs. 
The  convolution  problem  can  be   viewed  as   a   problem  of 
combining   two  data   streams,   wi ' s   and  x.'s,  in  a  certain 
manner  to  form  a  resultant  data  stream  of  y.'s.   The  type  of 
computation   is   common  to  a  number  of  computation  routines, 
such   as    filtering,    pattern   matching,     correlation, 
interpolation,   polynomial   evaluation   (including   discrete 
Fourier   transforms),   and  polynomial   multiplication   and 
division. 
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The  convolution  problem  is  compute-bound,  since  each 
input  x.  is  to  be  multiplied  by  each  of  the  k  weights.  If 
the  x.  is  input  separately  from  memory  for  each 
multiplication,  then  when  k  is  large,  memory  bandwidth 
becomes  a  bottleneck,  precluding  a  high-performance, 
solution.  As  indicated  earlier,  a  systolic  architecture 
resolves  this  I/O  bottleneck  by  making  multiple  use  of  each 
x.  fetched  from  the  memory.  Based  on  this  principle,  a 
systolic  design  for  solving  the  convolution  problem  is 
illustrated  in  Figure  (8-2).  Weights  are  preloaded  to  the 
cells,  one  at  each  cell,  and  stay  at  the  cells  throughout 
the  computation.  Partial  results  y.  move  systolically  from 
cell  to  cell  in  the  left-to-right  direction,  that  is,  each 
of  them  moves  over  the  cell  to  its  right  during  each  cycle. 
At  the  beginning  of  a  cycle,  one  x.  is  broadcast  to  all  the 
cells  and  one  y. ,  initialized  as  zero,  enters  the  left-most 
cell.  During  cycle  one,  w.x..  is  accumulated  to  y.  at  the 
left-most  cell,  and  during  cycle  two,  w,x~  and  w2x~  are 
accumulated  to  y2  and  y.  at  the  left-most  and  middle  cells, 
respectively.  Starting  from  cycle  three,  the  final  (and 
correct)  values  of  yi,y2,...  are  output  from  the  right-most 
cell  at  the  rate  of  one  y.  per  cycle. 

The  design  presented  above  by  no  means  exhausts  all  the 
possible  systolic  designs  for  the  convolution  problem.  Rung 
[92]  presents  five  additional  systolic  designs  for  the  same 
problem.    The   convolution   problem   is   just   one   of  many 
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computer-bound   computations   that   can   benefit   from   the 
systolic   approach.    Systolic   designs   using   (one   or  two 
dimensional)  array  or  tree  structures  are  available  for   the 
following  regular,  compute-bound  computations. 
Signal  and  image  processing: 

FIR,  IIR  filtering  and  1-D  convolution; 

2-D  convolution  and  correlation; 

discrete  Fourier  transform; 

interpolation; 

1-D  and  2-D  median  filtering;  and 

geometric  warping. 
Matrix  arithmetic: 

matrix-vector  multiplication; 

matrix-matrix  multiplication  [93,94]; 

matrix  triangularization; 

LU  decomposition;  and 
solution  of  triangular  linear  systems. 
In  general  systolic  designs  apply  to   any   compute-bound 
problem   that   is   regular,   that   is,   one  where  repetitive 
computations  are  performed  on  a  large  set  of  data. 


Orthogonal  Systolic  Arrays 

Orthogonal  systolic  arrays  are  two-dimensional  systolic 
arrays.  As  in  one-dimensional  systolic  arrays,  data  in 
orthogonal  arrays  may  flow  in  multiple   directions   and   at 
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multiple  speeds.  Our  objective  is  to  use  orthogonal 
systolic  structures  to  implement  recursive  matrix  operations 
very  fast.  With  systolic  architectures  the  matrix 
multiplication  can  be  reduced  from  0(n  )  to  0(n). 

Kungfs  Systolic  Array 

We  consider  the  problem  of  multiplying  two  (nxn) 
matrices.  It  is  easy  to  see  that  the  matrix  product  Z-(z^.) 
of  A=(a. .)  and  B»(b. .)  can  be  computed  by  the  following 
recurrences : 

zi1*  -  0 
ID 

z<k+1)  =  z!k)+  a.wbtH 
i]        i]     lk  k] 

(n+1) 

z  .  .  =  z  :  . 

ij     i] 

The  simplest  matrix  multiplication  orthogonal  array  and  its 
cell   definition   are  depicted  in  Figure  (8-3).   The  columns 

A.  and  rows  B.   are   broadcasted   along   the   square   array, 

2 
consisting   of   n   cells,  as  shown  in  Figure  (8-3).   In  each 

cell  or  PE  element   the   result   z.    stays   at   a   cell   to 

accumulate   its   terms,   allowing  efficient  use  of  available 

multiplier-accumulator  hardware.   So  each  cell  performs   the 

operation   z.  .-=z.  .+a..b,  ..    The  matrix  elements  a.,  and  b,  . 
r  ij   ij   ik  k]  lk      k  j 

are  broadcasted  to  the  nearest  PE  element  as  shown  in  Figure 
(8-3).  For  a  multiplication  of  two  (nxn)  matrices,  Z=A«B, 
the  computation   time   is   (4n-l)   computational   units.    A 
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computational  unit  is  defined  as  the  time  required  to 
compete  the  multiply-add  operation  in  one  cell.  The  (4n-l) 
units  can  be  verified  by  the  fact  that  n  units  are  needed  to 
compute  z..,  (n-1)  units  for  computing  zln*  n  units  for  the 
computation  of  z  and  finally  n  units  to  shift  the  final 
result  out  of  the  orthogonal  systolic  architecture. 

For  the  computation  of  Z=A-B+C,  n  additional  units  are 
required  to  shift  in  matrix  C  and  have  z\.  ,  bringing  the 
total  computation  time  to  (5n-l)  units.  However,  for  an 
operation  Z=A«B+OD,  the  computation  time  remains  ( 5n-l ) 
units  and  can  be  verified  by  the  fact  that  ( 3n-l )  units  are 
required  for  A-B,  without  shifting  the  result  out,  n  units 
for  C-D,  assuming  proper  pipelining  is  used  and  n  units  for 
shifting  the  result  out.  Finally  for  an  (nxn)-(nxm)  matrix 
multiplication,  ( 3n+m-l )  computational  units  are  required. 

Liu's  and  Young's  Systolic  Array 

Again  consider  the  multiplication  of  two  (nxn) 
matrices,  Z=A«B.  Assuming  the  matrix  A  is  already  inside 
the  multiplication  array,  B  can  be  piped  in  to  interact  with 
A  to  produce  the  multiplication  result  as  shown  in  Figure 
(8-4).  During  each  computation  cycle,  all  the  cells  in  the 
array  perform  the  same  multiplication  step.  Each  cell  PE. . 
takes  the  sum  of  partial  products  from  the  left  neighbor  and 
add  it  to  its  partial  product  of  (a*b),  and  then  passes  the 
sum  to  its  right  neighbor  for  the  next  multiplication   step, 
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as  illustrated  in  Figure  (8-4).  For  each  multiplication 
step,  the  data  stream  of  B  is  piped  one  row  deeper  into  the 
array. 

The  computation  time  of  the  multiplication  is  (4n-2) 
units,  including  n  units  of  matrix  A  loading  time  and  (3n-2) 
units  of  multiplication  time.  A  PE,  once  activated, 
performs   the   necessary  multiplication  at  every  cycle  time, 

until  there  is  no  more  data  entering  the  PE.   At  the  peak  of 

2 
computation,   about   75   percent   of   the   n    PE's   compute 

simultaneously.   The  (3n-2)  units  of  multiplication  time  can 

be   verified   by  the  fact  that  z^    is  piped  out  of  the  array 

after  n  multiplications,  and   the   time   difference   between 

piping   out  z, ,  and  z    is  ( 2n-2 )  units  due  to  startup  time 

delay,    as    shown   in   Figure    (8-4).    Although   the 

multiplication  computation  time,  (4n-2)  units,  is  almost  the 

same  as  in  Kung's  algorithm,  (4n-l)   units,   it's   advantage 

can   be   seen   in   the   computation   time   of   the  operation 

Z=A-B+C.   This  algorithms  will   require   exactly   the   same 

computational  units  as  in  multiplication,  ie.,  (4n-2)  units, 

compared  to  (5n-l)  computational  units  of  Kung's   algorithm. 

In    addition,    for   an   operation   Z=A'B+A«C,    (5n-2) 

computational  units  will   be   required   since   matrix  A   is 

already  loaded. 
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Pipelined  Systolic  Array 

The  new  proposed  scheme  is  an  improvement  of  Liu's  and 
Young's  systolic  array  processor.  Instead  of  matrix  A  to  be 
loaded  first  and  then  operations  taking  place,  matrix  A  and 
matrix  B  will  be  pipelined,  so  operation  can  take  place 
before  A  is  completely  loaded  into  the  array  processor,  as 
illustrated  in  Figure  (8-5).  In  other  words,  loading 
process  and  arithmetic  operations  can  be  performed 
simultaneously,  yielding  higher  PE  utilization. 

For  a  multiplication  of  two  (nxn)  matrices,  Z=A-B,  the 
computation  time  is  (2n)  units,  n  computational  units  for 
z1 1  to  be  piped  out  and  n  units  for  the  next  loading  matrix, 
C,  to  be  partially  loaded  for  the  next  matrix 
multiplication.  After  2n  computational  units  the  initial 
partial  result  y..,  of  the  next  operation  Y=OD,  is  ready  to 
be  calculated.  It  should  be  noted  that  the  result  znn  is 
not  readily  available  at  the  output  of  the  systolic  array 
when  y1 1  is  processed.  Thus,  there  is  overlapping  between 
two  successive  multiplications  and  higher  PE  utilization  can 
be  achieved.  For  an  operation,  Z=A-B+C,  2n  computational 
units  are  required,  the  same  as  in  the  multiplication.  The 
computational  time  for  an  operation,  Z=A*B+A«D,  is  3n  units. 
For  an  (nxn)-(nxm)  matrix  multiplication  the  computational 
time  is  (n+m)  units. 

In  a  general  recursive  algorithm  the  order  of  matrix 
operations   must  be  carefully  selected  in  order  to  take  full 
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advantage  of  the  properties  of  the  pipelined  systolic 
algorithm.  One  drawback  of  the  algorithm  is  that  the 
elements  of  the  loading  matrix  are  structured  differently 
than  the  resultant  matrix  so  some  delay  is  required,  (n-2) 
computational  units,  to  transform  the  resultant  matrix  into 
a  loading  matrix  to  be  used  for  the  next  operation.  For 
example,  in  an  operation  Z=A-B-C,  (B-C)  is  required  to  be 
performed  first  and  then  A- (B-C),  instead  of  (A-B)  and  then 
(A»B)*C.  Using  the  first  method,  the  resultant  matrix, 
(B-C),  can  be  used  directly  as  a  multiplier  matrix, 
maintaining  throughput.  If  the  second  method  was  chosen,  an 
additional  delay,  (n-2)  computational  units,  is  required  to 
transform  the  resulatant  matrix,  (A-B),  into  proper  form  to 
be  used  as  a  loading  matrix. 

Kalman  Filtering  Using  Systolic  Arrays 

This  section  describes  the  restructuring  of  the  Kalman 
filter  equations  into  a  form  suitable  for  implementation  by 
the  systolic  arrays,  presented  in  the  previous  section.  The 
Kalman  filter  equations  are  inherently  parallel,  due  to 
their  vector-matrix  form,  but  they  must  be  recast  with 
consideration  for  flows,  concurrency  and  dependencies,  to 
take  advantage  of  the  parallel  hardware. 

An  expression  tree  [95]  can  be  constructed  to  reveal 
information   about   flow,  concurrency  and  dependencies.   The 
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tree  is  essentially  binary,  except  for  unary  operators  like 
matrix  inversion.  Transposition  will  be  considered  a 
retrieval  operation  and  not  an  executable  operation  by 
itself.  To  help  with  analysis  of  this  tree,  two  types  of 
subtrees  need  to  be  identified.  The  first  is  the  duplicate 
subtrees,  which  can  be  evaluated  once  and  the  results  stored 
for  reuse.  The  other  subtrees  to  be  identified  are 
designated  by  Irani  and  Chen  [95]  as  maximum  p-subtrees. 
These  are  subtrees  in  which  the  root  is  either  the  root  of 
the  entire  tree  or  a  child  of  a  node  with  a  different 
arithmetic  precedence. 

Having  identified  these  subtrees,  commutative, 
associative,  and  distributive  laws  for  matrices  can  be  used 
to  manipulate  the  expression  tree  to  adapt  it  for  a 
particular  machine.  The  commutative  law  allows  addition  and 
scalar-matrix  multiplication  nodes  to  be  reshuffled  without 
invalidating  the  result.  Associative  laws  allow  in-order 
shifting  of  nodes  within  the  same  maximal  p-subtree.  Thus, 
a  subtree  can  be  left  or  right  justified  to  maintain 
execution  flow.  The  distributive  law  allows  common 
subexpressions  to  be  factored  out  of  a  subbackbone  of 
addition  and  subtraction  nodes.  The  common  subexpression 
becomes  the  child  of  a  new  multiplication  node  with  the 
remaining  expression  as  the  other  child. 

For  the  Kalman  filter,  the  initial  equations  can  be 
expressed  as 
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x  -  F'X    +  F«K«(z  -  H«x) 

P  =  F-P-FT  -  F-K«H-P'FT  +  G-Q-G 

K  =  P-HT- (H-P-HT+R)~ 

The  tree  form  of  these  equations  is  shown  in  Figure  (8-6). 
Common  subtrees  and  maximal  p-subtrees  are  indicated.  The 
first  restructuring  is  done  to  shift  the  computational 
backbone  to  the  right  using  the  commutative  law.  Two 
rotations  are  done  involving  addition  nodes.  The 
associative  law  is  used  to  shift  four  multiplication 
subtrees,  but  notice  that  the  operands  are  still  in  the  same 

order . 

T 
The  final  manipulation  factors   out   P«F    in   P.    The 

expression  P-H  can't  be  factored  in  K  because  of  the  inverse 

node.   The  final  tree  is  shown  in  Figure  (8-7),  which   leads 

to   the   following   equations   in  which  the  number  of  matrix 

multiplications  is  reduced  25  percent. 

a  =  F-K 

x  =  F«x  +  a-(z  -  H'x) 

P  =  (F-a-H)-P-FT  +  G-Q«GT 

T 
b  =  P-H 

K  =  b« (H'b+R)-1 
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DUBLICATE  SUBTREE 

MAXIMAL  P-SUBTREE 


Figure  8-6.     Initial  Expression  Tree. 
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Figure  8-7.     Final  Expression  Tree 
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Matrix  inversion  is  a  computationally  expensive 
operation  with  a  time  complexity  of  0(n  )  for  the  general 
case  in  serial  form.  Since  matrix  inversion  is  important  in 
many  applications,  it  has  been  the  subject  of  many  studies 
to  determine  efficient  algorithms  for  both  serial  and 
parallel  computation.  Many  of  these  algorithms  involve  a 
decomposition  or  transformation  of  the  matrix  into  a 
different  form  prior  to  inversion.  Iterative  techniques 
such  as  Gauss-Seidel  and  Newton-Raphson  are  available  for 
solving  systems  of  simultaneous  equations,  but  are  not 
normally  used  for  matrix  inversion.  However,  in  this 
particular  application  an  iterative  technique  proved 
superior  because  a  very  good  intial  estimate  of  the  desired 
inverse  was  available  from  the  previous  step. 

The  matrix  to  be  inverted  in  the  Kalman  filter   problem 
is 

A  =  H-P-HT  +  R 

where  R  is  a  constant  matrix  and  H  is  a  constant  vector.  P 
is  a  matrix  which  changes  only  slowly  at  each  evaluation 
step.  Thus,  an  excellent  initial  guess  for  A-inverse  is  the 
previous  value  of  A-inverse.  The  iterative  algorithm  is 
specified  by 

D(k+1)  =  D(k) -[2-1  -  A-D(k) ] 

where  A  is   the   original   matrix   and   D   is   the   eventual 
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inverse.  Simulation  studies  have  shown  this  algorithm  to  be 
surprisingly  robust  for  this  particular  application, 
requiring  only  three  to  four  iterations  during  a  normal 
filter  evaluation  step  [96]. 

Using  the  final  expression  tree  shown  in  Figure  (8-7) 
and  the  inversion  algorithm,  described  previously,  the 
Kalman  filter  equations  can  be  implemented  using  the 
systolic  arrays  presented  earlier.  Table  (8-1)  compares  the 
computation  time  for  each  of  the  three  algorithms  at  each 
step  and  the  final  computation  times  are  presented  in  terms 
of  the  system  size  n,  assuming  that  four  iterations  are 
required  for  the  computation  of  the  inverse  matrix.  All 
matrices  and  vectors  are  assumed  to  be  of  the  same  size. 

The  hierarch  of  the  operations  is  very  important  for 
optimal  computational  time  minimization.  Looking  at  Table 
(8-1),  it  is  observed  that  in  operations  2,  7  and  9, 
multiplications  were  paired,  taking  advantage  of  the 
pipelined  architecture.  Also  note  that  the  hierarch  of  the 
operations  were  chosen  is  such  a  manner,  so  that  the 
resultant  matrix  of  an  operation  was  not  used  immediately  as 
a  loading  matrix  in  the  next  operation.  The  only  time  that 
this  was  unavoidable  was  in  operation  9a,  where  (F-a*H)  was 
used  as  a  loading  matrix  for  operation  10  and  the  proper 
delay  [(n-2)-l]  was  taken  into  account. 

The  three  algorithms  were  compared,  in  terms  of 
computation   time,   for  filters  of  size  2-60  and  the  results 


Table  8-1.   Kalman  Filter  Execution  Times, 
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OPERATION 

SIZE 

NEW 

LIU'S 

RUNG ' S 

1 

T 
Q-G1 

(nxn) 

■ ( nxn ) 

2n 

4n-2 

4n-l 

2a 

P-H  =b 

(nxn) 

• ( nxn) 

2n 

4n-2 

4n-l 

2b 

T 
P«F 

( nxn ) 

• ( nxn) 

n 

n 

4n-l 

3 

R+H-=A 

( nxn ) 

• ( nxn ) 

2n 

4n-2 

5n-l 

4 

G-(Q'GT) 

( nxn ) 

• ( nxn ) 

2n 

4n-2 

4n-l 

5 

D(k+l)=D(k) •[2l-A«D(k) ] 
INVERSE  (4  ITERATIONS) 

( nxn ) 

• ( nxn ) 

16n 

32n-16 

36n-8 

6 

b-D=K 

( nxn ) 

( nxn ) 

2n 

4n-2 

4n-l 

7a 

F«K=a 

(nxn) 

( nxn) 

2n 

4n-2 

4n-l 

7b 

F-x 

( nxn) 

( nxl ) 

1 

1 

4n-l 

8 

z-H«x 

(nxn) ■ 

(nxl ) 

n+1 

3n-l 

3n+l 

9a 

F-a«H 

( nxn) - 

( nxn ) 

2n 

4n-2 

5n-l 

9b 

( F«x)+a- ( z-H-x)=x 

( nxn) < 

(nxl ) 

1 

1 

3n+l 

10 

G • Q • GT+ ( F-a • H ) • ( P • FT ) =P 

( nxn ) • 

( nxn ) 

3n-3 

4n-2 

5n-l 

TOTAL 

35n 

68n-8 

)4n-15 
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Figure  8-8.     Comparison  of  Execution  Times. 
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are  shown  in  Figure  (8-8).  Stars  designate  Kung's  systolic 
architecture,  triangles  represent  Liu's  and  Young's 
algorithm,  and  the  straight  line  illustrates  the  new 
pipelined  systolic  architecture.  Clearly  the  new 
architecture  is  the  fastest,  requires  the   least   amount   of 

computational   units  and  requires  exactly  the  same  number  of 

2 
PE  elements  as  the  other  two  architectures,  i.e.,  n   cells. 


CHAPTER  NINE 
SUMMARY  AND  CONCLUSIONS 


The  theory  of  optimal  filtering,  and  especially  Kalman 
filtering,  has  been  extensively  researched  since  it  was 
introduced  in  early  1960's.  The  purpose  of  this 
dissertation  was  to  develop  and  study  high-speed, 
high-precision,  adaptive  Kalman  filters. 

Recent  developments  in  the  area  of  logarithmic  number 
systems  ( LNS )  have  proven  it  to  offer  some  significant 
advantages  over  other  number  weighted  systems  like  the  fixed 
or  floating  point.  The  main  advantages  of  LNS  are  the 
extended  dynamic  range  on  which  they  operate  and  the  high 
speed  of  operations  accompanied  by  a  remarkable  regular  data 
flow. 

The  implementation  of  a  Kalman  filter  using  the  LNS 
arithmetic  was  developed.  In  addition,  an  analytical  method 
of  determining  the  degradation  in  performance  of  LNS  Kalman 
filters  was  presented.  The  analytical  results  compare 
favorably  with  the  results  of  the  simulation  study. 
Comparison  between  the  LNS  implemented  Kalman  filter  and  the 
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conventional  fixed  point  Kalman  filter  was  made  in  terms  of 
accuracy  and  speed.  The  superiority  of  the  LNS  Kalman 
filter  was  apparent  in  both  accuracy  and  throughput. 

For  a  Kalman  filter  to  yield  optimal  performance  it  is 
necessary  to  provide  the  correct  a  priori  descriptions  of 
the  input  and  output  noise  covariances,  normally  denoted  Q 
and  R.  An  algorithm  has  been  developed  for  identifying 
unknown  noise  covariances  in  Kalman  filter  applications. 
The  algorithm  makes  use  of  the  first  and  second  terms  of  the 
autocorrelation  function  of  the  innovation  process.  By 
determining  first  the  ratio  of  the  unknown  covariances, 
their  actual  values  can  be  predicted.  Implementation  of  the 
algorithm  is  possible  and  real-time  throughput  can  be 
achieved  with  the  use  of  fast  autocorrelation  and  search 
algorithms.  The  proposed  algorithm,  when  compared  to 
Mehra's  and  Carew's  algorithms,  was  shown  to  be 
architecturally  simple,  and  to  achieve  higher  levels  of 
speed  and  convergence  performance  than  comparative  methods. 

Finally,  the  implementation  of  the  Kalman  filter  using 
systolic  array  architectures  was  investigated.  A  systolic 
architecture  is  a  methodology  for  mapping  high-level 
computations  into  hardware.  It  has  received  much  attention 
recently  as  a  means  of  achieving  high  data  rates.  Among  the 
many  applications  systolic  designs  have  been  used  to  perform 
matrix  operations  making  them  suitable  for  Kalman  filtering 
processing.   Using  an  orthogonal  systolic  array  processor,  a 
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new  algorithm  was  developed  which  increases  the  throughput 
of  matrix  operations.  The  new  algorithm  was  compared  with 
other  orthogonal  systolic  architectures  on  the 
implementation  of  Kalman  filters  and  it's  superiority  was 
apparent. 

The  result  of  this  study  was  a  very  high  performance 
Kalman  filter  capable  of  processing  data  in  the  tens  of 
megahertz  range  while  providing  floating-point  precision 
using  small  wordlength  arithmetic  units.  The  system 
proposed  also  lends  itself  to  VLSI  implementation  and  can  be 
integrated  into  a  fixed  or  dynamically  allocated  data  flow 
architecture.  Since  adaptive  filtering  and  estimation  is 
intrinsic  to  many  present  and  future  control  systems,  this 
capability  is  needed. 

Future  research  in  this  area  should  be  devoted  in  the 
design  of  reconf igurable  systolic  architectures  for  Kalman 
filters.  These  architectures  would  improve  reliability  for 
real-time  Kalman  filter  applications,  by  replacing  faulty 
processors  on  line. 
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